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TRANSGENIC FISH WITH 
TISSUE-SPECIFIC EXPRESSION 



BACKGROUND OF THE INVENTION 

5 The disclosed invention is generally in the field of transgenic fish, 

and more specifically in the area of transgenic fish exhibiting tissue- 
specific expression of a transgene. 

Transgenic technology has become an important tool for the study 
of gene and promoter function (Hanahan, Science 246:1265-75 (1989); 
10 Jaenisch, Science 240:1468-74 (1988)). The ability to express, and study 
the expression of, genes in whole animals can be facilitated by the use of 
transgenic animals. Transgenic technology is also a useful tool for cell 
lineage analysis and for transplantation experiments. Studies on promoter 
function or lineage analysis generally require the expression of a foreign 
15 reporter gene, such as the bacterial gene lacZ, Expression of a reporter 
gene can allow the identification of tissues harboring a transgene. 
Typically, transgenic expression has been identified by in situ 
hybridization or by histochemistry in fixed animals. Unfortunately, the 
inability to easily detect transgene expression in living animals severely 
20 limits the utility of this technology, particularly for lineage analysis. 

An attractive paradigm for the understanding of gene expression, 
development, and genetics of animals, especially humans, is to smdy less 
complex organisms, such as Escherichia coli, Drosophila, and 
Caenorhabditis. The hope is that understanding of these processes in 
25 simple organisms will have relevance to sunilar processes in mammals 
and humans. The tradeoff is to accept the disadvantage that an 
experimental organism is only distantly related to humans for the 
advantage of easy manipulation, fast generation times, and more 
straightforward interpretation of results in the experimental organism. 
30 The disadvantage of this tradeoff can be lessened by using an organism 
that is as closely related as possible to mammals while retaining as many 
of the advantages of less complex organisms. The problem is to identify 
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suitable organisms for such studies, and, more importantly, to develop the 
tools necessary to manipulate such organisms. 

Some examples of cell determination in invertebrates have been 
shown to occur in progressive waves that are regulated by sequential 
cascades of transcription factors. Much less is known about such 
processes in vertebrates. An integrated approach combining 
embryological, genetic and molecular methods, such as that used to smdy 
development in Drosophila (for example. Ghysen et aL, Genes & Dev 
7:723-33 (1993)). would facilitate the identification of the molecular 
mechanisms involved in specifying neuronal fates in vertebrates, but such 
an approach has been hampered by a lack of robust genetic and molecular 
tools for use in vertebrates. 

Transgenic technology has been applied to fish for various 
purposes. For example, transgenic technology has been applied to several 
commercially important varieties of fish, primarily in an attempt to 
improve their cultivation. The use of transgenic technology in fish has 
been reviewed by Moav. IsraelJ. of Zoology 40:441-466 (1994), Chen et 
a/.. Zoological Studies 34:215-234 (1995), and Iyengar et al.. Transgenic 
Res. 5:147-166 (1996). 

Smart et aL, Development 103:403-412 (1988), describe 
integration of foreign DNA into zebrafish. but no expression was 
observed. Stuart et aL, Development 109:577-584 (1990). describe 
expression of a transgene in zebrafish from SV40 and Roiis sarcoma virus 
transcription regulatory sequences. Although expression was seen in a 
pattern of tissues, the expression within a given tissue was variegated 
Also, since Smart et aL (1990) selected transgenics by expression and not 
by the presence of the transgene, non-expressing transgenics would have 
been missed by their analysis. Gulp et aL, Proc. NatL Acad. ScL USA 
88:7953-7957 (1991), describe integration and germ line transmission of 
DNA in zebrafish. Although the constructs used included the Rous 
sarcoma virus LTR or SV40 enhancer promoter linked to a lacZ gene, no 
expression was observed. Bayer and Campos-Ortega, Development 
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115:421-426 (1992), describe integration and expression in zebrafish of a 
lacZ transgene having a minimal promoter (a mouse heat shock 
promoter) but no upstream regulatory sequences. The expression 
obtained depended on the site of integration indicating that endogenous 
sequences at the site of integration of the fish were responsible for 
expression. Westerfield et al.. Genes & Development 6:591-598 (1992), 
describe transient expression in zebrafish of jS-galactosidase from mouse 
and human Hox gene promoters. Lin et al.. Dev. Biology 161:77-83 
(1994), describe transgenic expression of lacZ in living zebrafish 
embryos. The transgene linked the enhancer-promoter of the Xenopus 
elongation factor la gene with the 2acZ coding se(^ence. Different lines 
of transgenic flsh exhibited different patterns of expression, indicatmg 
that the site of integration may be affecting the pattern of expression. 
Amsterdam et aL, Dev. Biology 171:123-129 (1995), and Amsterdam et 
al., Ge/ie 173:99-103 (1996), describe transgenic expression of green 
fluorescent protein (GFP) in zebrafish. The transgene linked the 
enhancer-promoter of the Xerwpus elongation factor la gene with the 
GFP coding sequence. As in Lin e/ al.. Dev. Biology 161:77-83 (1994), 
different lines of transgenic fish exhibited different patterns of 
expression, indicating that the site of integration may be affecting the 
pattern of expression. Although some of the systems described above 
exhibited patterned expression, none resulted in the transmission of stable 
tissue-specific expression of a transgene in zebrafish. 

It is an object of die present invention to provide transgenic fish 
having tissue- and developmentally-specific expression of transgenes. 

It is another object of the present invention to provide a method 
of making transgenic fish having tissue- and developmentally-specific 
expression of transgenes. 

It is another object of the present invention to provide a method 
of identifying compounds that affect expression of fish genes of interest. 

It is another object of the present invention to provide a method 
of identifying the pattern of expression of fish genes of interest. 
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It is another object of the present invention to provide a method 
of identifying genes that affect expression of fish genes of interest. 

It is another object of the present invention to provide a method 
of genetically marking mutant fish genes. 

It is another object of the present invention to provide a method 
of identifying fish that have inherited a mutant gene. 

It is another object of the present invention to provide a method 
of identifying enhancers and other regulatory sequences m fish. 

It is another object of the present invention to provide a construct 
that exhibits tissue- and developmentally-specific expression in fish. 

BRIEF SUMMARY OF THE INVENTION 
Disclosed are transgenic fish, and a method of making transgenic 
fish, which express transgenes in stable and predictable tissue- or 
developmentally-specific patterns. The transgenic fish contain transgene 
constructs with homologous expression sequences. Also disclosed are 
methods of using such transgenic fish. Such expression of transgenes 
allow the study of developmental processes, the relationship of cell 
lineages, the assessmem of the effect of specific genes and compounds on 
the developmem or mamtenance of specific tissues or cell lineages, and 
the maintenance of lines of fish bearing mutant genes. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure lA shows the nucleotide sequence at the exon/intron 
junctions of the zebrafish GATA-1 locus. The conserved splice sequences 
are underlined and the intron sequences are listed within parentheses. 
The amino acids encoded by the exon regions flanking the introns are 
shown beneath the nucleotide sequence. The upstream splice junction 
nucleotide sequences are SEQ ID N0:6 (IVS-1), SEQ ID NO:7 (IVS-2), 
SEQIDNO:8aVS-3),andSEQIDNO:9(IVS-4). The downstream ' 
splice junction nucleotide sequences are SEQ ID NO: 10 (IVS-1), SEQ ID 
NO:ll (IVS-2), SEQ ID N0:12 (IVS-3), and SEQ ID NO:13 (IVS-4). 
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The amino acid sequences spanning the introns are SEQ ID NO: 14 (IVS- 
1), SEQ ID NO: 15 aVS-2), SEQ ID NO: 16 (IVS-3), and SEQ ID 
NO: 17 (IVS-4). 

Figure IB is a diagram of the structure of the zebrafish GATA-1 
locus. Exon regions are filled. Intron regions are unfilled. The tall 
filled boxes represent the coding regions. The arrow indicates the 
putative transcription start site. EcoRI endonuclease sites are labeled E. 
Bglll endonuclease sites are labeled G. BamHI endonuclease sites are 
labeled B. 

Figure 2 is a diagram of the structures of three GATA-l/GFP 
transgene constructs used to make transgenic fish. The filled region to 
the right of the GM2 box in each construct represents the 5.4 kb or 5.6 
kb region of the GATA-1 locus upstream of the GATA-1 coding region. 
The box labeled GM2 represents a sequence encoding the modified green 
fluorescent protein. The thin angled lines in constructs (1) and (3) 
represent vector or linking sequences. EcoEI endonuclease sites are 
labeled E. BglU endonuclease sites are labeled G. BamHI endonuclease 
sites are labeled B. In construct (3), the BamHI/EcoRI fragment on the 
right side is the downstream BamHI/EcoRI fragment of the GATA-1 
locus. 

Figure 3 is a diagram of the structures of GATA-2/GFP transgene 
constructs for analyzing the expression sequences of the GATA-2 gene. 
The line represents all or upstream deleted portions of a 7.3 kb region 
upstream of the translation start site in the zebrafish GATA-2 gene. The 
hatched box represents a segment encoding the modified GFP and 
including a SV40 polyadenylation signal. Tick marks labeled P, Sa, A, 
C, and Sc indicates restriction sites PstI, Sad, Aatll, Clal and Seal, 
respectively, in the 7.3 kb region. 

Figure 4 is a diagram of the structures of GATA-2/GFP transgene 
constructs for analyzing the expression sequences of the GATA-2 gene. 
The thick open box represents a 1116 bp fragment of the upstream region 
of the GATA-2 gene required for neuron-specific expression. The thin 
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open box represents segments of the upstream i^gion of the GATA-2 gene 
proxmial to the transcription start site. The thick line represents the 
— 1 promoter of theX.«op^elongationfactor la gene. The hatched 
box represents a segmem encoding the modified GFP and including a 
SV40 polyadenylation signal. 

Figure 5 is a graph of the percem of embryos microinjected with 
the transgene constructs shown in Figure 4 that expressed GFP in 
neurons. 

Figure 6 is a graph of the percent of embryos microinjected with 
transgene constructs that expressed GFP in neurons. TTie transgene 
constructs were nsP5-GM2 and truncated forms of nsP5-GM2. 

Figure 7 is a graph of the percent of embryos microinjected with 
transgene constructs that expressed GFP in neurons. The transgene 
constructs were mutant forms of the ns3831 truncation of nsP5-GM2 
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DETAILED DESCRIPTION OF THE INVENTION 

Disclosed are transgenic fish, and a method of making transgenic 
fish, which express transgenes in stable and predictable tissue- or 
developmentally-specific patterns. Also disclosed are methods of using 
such transgenic fish. Such expression of transgenes allow the study of 
developmental processes, the relationship of cell lineages, the assessment 
of the effect of specific genes and compounds on the development or 
mamtenance of specific tissues or cell lineages, and the maintenance of 
lines of fish bearing mutam genes. The disclosed transgenic fish ar^ 
characterized by homologous expression sequences in an exogenous 
construct introduced into the fish or a progenitor of the fish. 

As used herein, transgenic fish refers to fish, or progeny of a 
fish, into which an exogenous construct has been introduced. A fish into 
which a construct has been introduced includes fish which have developed 
from embryonic cells into which the construct has been introduced. As 
used herein, an exogenous construct is a nucleic acid that is artificially 
introduced, or was originally artificially introduced, into an animal The 
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term artificial introduction is intended to exclude introduction of a 
construct through normal reproduction or genetic crosses. That is, the 
original introduction of a gene or trait into a line or strain of animal by 
cross breeding is intended to be excluded. However, fish produced by 
5 transfer, through normal breeding, of an exogenous construct (that is, a 
construct that was originally artificially introduced) from a fish containing 
the construct are considered to contain an exogenous construct. Such fish 
are progeny of fish into which the exogenous construct has been 
introduced. As used herein, progeny of a fish are any fish which are 
10 descended from the fish by sexual reproduction or cloning, and from 

which genetic material has been inherited. In this context, cloning refers 
to production of a genetically identical fish from DNA, a cell, or cells of 
the fish. The fish from which another fish is descended is referred to as a 
progenitor fish. As used herein, development of a fish from a cell or 
15 cells (embryonic cells, for example), or development of a cell or cells into 
a fish, refers to the developmental process by which fertilized egg cells or 
embryonic cells (and their progeny) grow, divide, and differentiate to 
form an adult fish. 



20 exhibiting cell lineage-specific expression can be made and used. The 
transgenic fish described in the examples, and the transgene constructs 
used, are particularly useful for early detection of fish expressing the 
transgene, the study of erythroid cell development, the smdy of neuronal 
development, and as a reporter for genetically linked mutant genes. 
25 Tissue-, developmental stage-, or cell lineage-specific expression 

of a reporter gene from a regulated promoter in the disclosed transgenic 
fish can be useful for identifying the pattern of expression of the gene 
from which the promoter is derived. Such expression can also allow 
study of the pattern of development of a cell lineage. As used herein, 
30 tissue-specific expression refers to expression substantially limited to 

specific tissue types. Tissue-specific expression is not necessarily limited 
to expression in a single tissue but includes expression limited to one or 



The examples illustrate the manner in which transgenic fish 
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more specific tissues. As used herein, developmental stage-specific 
expression refers to expression substantially limited to specific 
developmental stages. Developmental stage-specific expression is not 
necessarily limited to expression at a single developmental stage but 
includes expression limited to one or more specific developmental stage. 
As used herein, cell lineage-specific expression refers to expression 
substantially limited to specific cell lineages. As used herein, cell lineage 
refers to a group of cells that are descended from a particular cell or 
group of cells. In development, for example, newly specialized or 
differentiated cells can give rise to cell lineages. Cell Imeage-specific 
expression is not necessarily limited to expression in a single cell lineage 
but includes expression limited to one or more specific cell lineages. All 
of these types of specific expression can operate in the same gene. For 
example, a developmentally regulated gene can be expressed at both 
specific developmental stages and be limited to specific tissues. As used 
herein, the pattern of expression of a gene refers to the tissues, 
developmental stages. ceU lineages, or combinations of these in or at 
which the gene is expressed. 
1. Transgene Constructs 

Transgene constructs are the genetic material that is introduced 
into fish to produce a transgenic fish. Such constructs are artificially 
introduced into fish. The mamier of introduction, and, often, the 
structure of a transgene construct, render such a transgene construct an 
exogenous constinct. Although a transgene constmct can be made up of 
any nucleic acid sequences, for use in the disclosed transgenic fish it is 
preferred that the transgene constructs combine expression sequences 
operably linked to a sequence encoding an expression product. The 
transgenic construct will also preferably include other components that aid 
expression, stability or integration of the construct into the genome of a 
fish. As used herein, components of a transgene construct referred to as 
being operably linked or operatively linked refer to components being so 
comiected as to allow them to function together for their intended 
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purpose. For example, a promoter and a coding region are operably 
linked if ttie promoter can function to result in transcription of the coding 
region. 

A. Expression Sequences 

Expression sequences are used in the disclosed transgene 
constructs to mediate expression of an expression product encoded by the 
construct. As used herein, expression sequences include promoters, 
upstream elements, enhancers, and response elements. It is preferred that 
the expression sequences used in the disclosed constructs be homologous 
expression sequences. As used herein, in reference to components of 
transgene constructs used in the disclosed transgenic fish, homologous 
indicates that the component is native to or derived from the species or 
type of fish involved. Conversely, heterologous indicates that the 
component is neither native to nor derived from the species or type of fish 
involved. 

Two large scale chemical mutagenesis screens recently produced 
thousands of zebrafish mutants affecting development (Driever et al.. 
Development 123:37-46 (1996); Haffter et aL, Development 123:1-36 
(1996)). Such genes and theur expression patterns are of significant 
interest for understanding the developmental process. TTherefore, 
expression sequences from these genes are preferred for use as expression 
sequences in the disclosed constructs. 

As used herein, expression sequences are divided into two main 
classes, promoters and enhancers. A promoter is generally a sequence or 
sequences of DNA that function when in a relatively fixed location in 
regard to the transcription start site. A promoter contains core elements 
required for basic interaction of RNA polymerase and transcription 
factors, and may contain upstream elements and response elements. 
Enhancer generally refers to a sequence of DNA that functions at no fixed 
distance from the transcription start site and can be in either orientation. 
Enhancers function to increase transcription from nearby promoters. 
Enhancers also often contain response elements that mediate the regulation 
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of trai^mprton. Promoter can also comain response etanems that 
mediate the regulation of transcription. 

Enhancers often detemnne the nigulation of expression of a gene 
This effect has been seen in so-called enhancer trap constructs where 
uutoducdon Of a construct containing a teporter gene operably linked to a 
promoter is expressed oMy when the construct imerts into the domain of 
an enhancer (O'Kane and Gehring, Proc. Natl. Acad. Sci. VSA 84:9123- 
9127 (1987), Allen e, al.. Nature 333:852-855 (1988). Kothary « at 
mure 33S:435-437 (1988). Gossler « al.. Science 244:463-465 (1989)) 
In such cases, the expression of U» construct is regulated according to the 
pattern of the newly associated enhancer. Transgenic comtructs having 
only a mmimal promoter can be used in the disclosed transgenic fish to 
identify enhancers. 

Preferred enhancers for use in the disclosed transgenic fish are 
those that mediate tissue- or cell lineage-specific expression. More 
preferred are homologous enhancers that mediate tissue- or cell lineage- 
specific expression. Still more preferred are enhancers from fish GATA- 
1 and GATA-2 genes. Most preferred are enhancers from zebrafish 
GATA-1 and GATA-2 genes. 

For expression of encoded peptides or proteins, a transgene 
construct also needs sequences that, when transcribed into RNA. mediate 
translation of the encoded expression pn>ducts. Such sequences are 
generally found in the 5' untranslated region of transcribed RNA This 
region corresponds to the region on the construct between the 
25 transcription initiation site and the translation initiation site (that is the 
mitiation codon). The 5' untranslated region of a construct can be 
denved from the 5' untranslated region normally associated widi the 
promoter used in the construct, the 5' untranslated region normally 
associated with the sequence encoding the expression product, the 5' 
unu-anslated region of a gene unrelated to the promoter or sequence 
encoding the expression product, or a hybrid of these 5' untranslated 
regions. Preferably, the 5' untranslated region is homologous to the fish 
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into which the construct is to be introduced. Preferred 5' untranslated 
regions are those normally associated with the promoter used. 
B. Expression Products 

Transgene constructs for use in the disclosed transgenic fish can 
5 encode any desired expression product, including peptides, proteins, and 
RNA. Expression products can include reporter proteins (for detection 
and quantitation of expression), and products having a biological effect on 
cells in which they are expressed (by, for example, adding a new 
enzymatic activity to the cell, or preventing expression of a gene). Many 
10 such expression products are known or can be identified. 
Reporter Proteins 
As used herein, a reporter protein is any protein that can be 
specifically detected when expressed. Reporter proteins are useful for 
detecting or quantitating expression from expression sequences. For 
15 example, operatively linking nucleotide sequence encoding a reporter 
protein to a tissue specific expression sequences allows one to carefully 
stady lineage development. In such stodies, the reporter protein serves as 
a marker for monitoring developmental processes, such as cell migration. 
Many reporter proteins are known and have been used for similar 
20 purposes in other organisms. These include enzymes, such as 0- 

galactosidase, luciferase, and alkaline phosphatase, that can produce 
specific detectable products, and proteins that can be directly detected. 
Virtually any protein can be directly detected by using, for example, 
specific antibodies to the protein. A preferred reporter protein that can be 
25 directiy detected is the green fluorescent protein (GFP). GFP, from the 
jellyfish Aequorea victoria, produces fluorescence upon exposure to 
ultraviolet light without the addition of a substrate (Chalfie et al.. Science 
263:802-5 (1994)). Recently, a number of modified GFPs have been 
created that generate as much as 50-fold greater fluorescence than does 
30 wild type GFP under standard conditions (Cormack et al. , Gene 173:33-8 
(1996); Zolotukhin et al., J. Virol 70:4646-54 (1996)). This level of 
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fluorescence allows the detection of low levels of tissue specific 
expression in a living transgenic animal. 

without ^ " °' "'-"^ 
without req^nng u» addition of exogenous factor, are preferred for 

detectmg or assessing gene expression during zebrafish embryonic 
development. A transgenic .ebrafish embryo, canying a eomtntet 
encodmg a reporter protein and a tissue-specific expression sequences 
can provide a rapid teal time « nyo system for analyzing spatial and ' 
ttn^poral expression patterns of develq»neMally regulated genes. 
C. Other Ccmstnict Sequences 
Tlie disclosed transgene constructs preferably include other 
-quences which impmve expression from, or stability of, the construct 
For example, mcludmg a polyadenylation signal on d«= constructs 
encoding a protein ensures U«t transcripts from the transgene wiU be 
processed and transported as mRNA. The identif.ca«on and use of 
polyadenylauon signals in expression co„str„«s is well established U is 
preferred that homologous polyadenylation signals be used m the 
transgene constructs. 

I. is also known tha, the presence of introns in primary transcripts 
can mcrease expt^sion, possibly by causing the transchp, „ enter the 
processing and transport system for mRNA. ,t is prefem=d that an mtron 

:rfr""'"*"'~'=^'='''™"'^^''--'-' ' 

egton of the transgene transcript. It is also preferred tha, the in^ be 
homologous to the fish used, and mo« preferably homologous to the 
expression sequences used (that is, th« the intron be fh,m the same gene 
ftat some or au of the expression sequences ate from,. The use and 
unporta.ee of these and other components useftrl for ttansgene constructs 

rrrr " 

(1991); Stppel « -The Regulatory Domain Organization of 
Eukaryotic Genomes: Implications For Stable Gene Transfer" in 
Transgenic Aninu,U (Grosveld and KoUias, eds.. Academic Press 1992) 
pages 1-26; KoUias and G^sveld. "The Study of Gene Regulation in ' 
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Transgenic Mice" in Transgenic Animals (Grosveld and Kollias, eds. 
Academic Press, 1992), pages 79-98; and Clark et al., Phil Trans. R. 
Soc. Lond. B. 339:225-232 (1993). 

The disclosed constructs are preferably integrated into the genome 
5 of the fish. However, the disclosed transgene construct can also be 
constructed as an artificial chromosome. Such artificial chromosomes 
containing more that 200 kb have been used in several organisms. 
Artificial chromosomes can be used to introduce very large transgene 
constructs into fish. This technology is useful since it can allow faithful 
10 recapimlation of the expression pattern of genes that have regulatory 
elements that lie many kilobases from coding sequences. 
2. Fish 

The disclosed constructs and methods can be used with any type 
of fish. As used herein, fish refers to any member of the classes 
15 collectively referred to as pisces. It is preferred that fish belonging to 
species and varieties of fish of commercial or scientific interest be used. 
Such fish include salmon, trout, tona, halibut, catfish, zebrafish, medaka, 
carp, tilapia, goldfish, and loach. 

The most preferred fish for use with the disclosed constructs and 
20 methods is zebrafish, Danio rerio. Zebrafish are an increasingly popular 
experimental animal since they have many of the advantages of popular 
invertebrate experimental organisms, and include the additional advantage 
that they are vertebrates. Another significant advantage of zebrafish for 
the study of development and cell lineages is that, like Caenorhabditis, 
25 they are largely transparent (Kimmel, Trends Genet 5:283-8 (1989)). The 
generation of thousands of zebrafish mutants (Driever et aL, Development 
123:37-46 (1996); Haffter et al.. Development 123:1-36 (1996)) provides 
abundant raw material for transgenic smdy of these animals. General 
zebrafish care and maintenance is described by Streisinger, Natl. Cancer 
30 Inst. Monogr. 65:53-58 (1984). 

Zebrafish embryos are easily accessible and nearly transparent. 
Given these characteristics, a transgenic zebrafish embryo, carrying a 
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co„sm.« encoding a rep„„er pn«.i„ »d a^ue-spccific exp:^.i„„ 
sequences. c» provide a rapid real toe v/v. system for analyzing 
spaml and temporal expression patterns of developmentally regulated 

in ^dition. embn-onic deve.opmen. of d,e ^ is extremely 
«P In 24 hours an embryo develops mdimen« of ail the major o,^ 
.noludtng a ^^^^ • 

a^ne. 5:283-8 (1989,,. Other fish with some or all of the same destatble 

characteristics are also preferred. 

3. ProducUon of Transgenic Fish 

The disclosed transgenic fish are produced by introducing a 

Tansgene construe, into cells of a fish, preferably embtyonic cells, and 
most preferably in a single cell embryo. Whe« the transgene construct is 
mt,«l„ced .mo en,bn,o„ic cells, the transgenic fish is obtained by 
allowmg the embryonic cell or celU to develop imo a fish. Introduction 
of c^nstntcts in.o embryonic cells offish, and subsequent development of 
the fish, are sm,plif,ed by tte ftct flat embryos develop outside of the 
pwent fish in most fish species. 
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The disclosed transgene constructs can be introduced into 
embo-onic fish cells using any suitable technique. Many techniques for 
such mtroducaon of exogenous genetic material have been demonshated 
m fish and other animals. TT^se include nuct^injection (described by for 
example, Culp « (.991,). electroporation (described by, for example 
^e « a,. Cell. Differ. Develop. 29:.23-,28 (1990); Mailer e, al ' 
m. i«r. 324:27-32 (1993); Muralcami e, a,.. J. B,o,ecHnot. 34■.3S^^ 
(1994): Mailer « a/.. «„, a„«,,„„, i^27g.,3, ^ 

Symonds « al.. AquacuUure 119:313-327 (1994,,. particle gun 
bombardment (Zelenin e, al.. FEES Lea. 287:1,8-120 (199,,,, and the 
use of liposomes (Szelei e, al.. Tramgenlc Re.. 3:116-119 (1994,, 
Microinjecuon is p«. TT^ p„fc,^ ^ 
transgene constructs mto fish embryonic cells by microinjection is 
described in the examples. 
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Embryos or embryonic cells can generally be obtained by 
collecting eggs immediately after they are laid. Depending on the type of 
fish, it is generally preferred that the eggs be fertilized prior to or at the 
time of collection. This is preferably accomplished by placing a male and 
female fish together in a tank that allows egg collection under conditions 
that stimulate mating. After collecting eggs, it is preferred that the 
embryo be exposed for introduction of genetic material by removing the 
chorion. This can be done manually or, preferably, by using a protease 
such as pronase. A preferred technique for collecting zebrafish eggs and 
) preparing them for microinjection is described in the examples. A 
fertilized egg cell prior to the first cell division is considered a one cell 
embryo, and the fertilized egg cell is thus considered an embryonic cell. 

After introduction of the transgene construct the embryo is 
allowed to develop into a fish. This generally need involve no more than 
5 incubating the embryos under the same conditions used for incubation of 
eggs. However, the embryonic cells can also be incubated briefly in an 
isotonic buffer. If appropriate, expression of an introduced transgene 
construct can be observed during development of the embryo. 

Fish harboring a transgene can be identified by any suitable 
20 means. For example, the genome of potential transgenic fish can be 
probed for the presence of construct sequences. To identify transgenic 
fish actually expressing the transgene, the presence of an expression 
product can be assayed. Several techniques for such identification are 
known and used for transgenic animals and most can be applied to 
25 transgenic fish. Probing of potential or actual transgenic fish for nucleic 
acid sequences present in or characteristic of a transgene construct is 
preferably accomplished by Southern or Northern blotting. Also 
preferred is detection using polymerase chain reaction (PGR) or other 
sequence-specific nucleic acid amplification techniques. Preferred 
30 techniques for identifying transgenic zebrafish are described in the 
examples. 
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4. W™aWng the Pattern of Eipmsion of H5hG«,e» 

"'-"^'^ Of exp^„„ to a„ disclosed „»sgenic 

fi* c» ^ accomplished ,y „ ,<,„„^^^ J 

--g- « d.f„e« U^ues (^ssue-speciflc expression). differ. 
d»™^ devclopmem (developmentally „goia«d exp:ession or 
developmental stage-specific expression,, in different cell lineages (cell 
^eage-specific expression, These assessments can also be combined by 
fo example measuring expression (and observing changes, if any) i„ a 
«. me^e during development, ^e namre of the expression pr^uct to 
^^^canhaveanetfectonthesnitabilityofsomeofJan^J 
on o„ level, d-fferent tissues of a fish can be di.sec.ed and expression 

^ort^d when us,ng almost any expression product. 1^ technhme is 
commonly used in transgenic animals a«i is usefi., for assessing tile- 
Specific expression. 

This technique can also be used to assess expression during the 
course of development by assaying for the exnr.« . 

*^ ' assaying tor the expression product at different 

de^^^pmenta. stages. Whete detection of the expression product re,uires 
«nbryo or fish, multiple embryos must be „s«.. J 
where the expre^ion pattern in different embryos is expected to be the 
same or similar. This will be the case when using the disclosed 
ttansgenic fish having stable and predictable expression 

A more preferred way of assessing the pattern of expression of a 
^g»e durmg development is „ use an expression product that can be 
det^^mlrvrngembryosarKlanimals. A p^ferred expression product 
G^aLTT . *e green fiuo^scen. pro.in. A preferred form of 
GFP and a preferred techmque for measuring .he presence of GFP i„ 
living fish is described in Uie examples. 

de.eeted^"" ^ -» - 

detected usmg any appropriare method. Many means of detecting 

expression products are toown and can be applied to the detection of 
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expression products in transgenic fish. For example. RNA can be 
detected using any of numerous nucleic acid detection techniques. Some 
of these detection methods as applied to transgenic fish are described in 
the examples. The use of reporter proteins as the expression product is 

preferred since such proteins are selected based on their detectability. 

The detection of several useful reporter proteins is described by Iyengar et 

al (1996). 

In zebrafish, the nervous system and other organ rudiments 
appear within 24 hours of fertilization. Since the nearly transparent 
zebrafish embryo develops outside its mother, the origin and migration of 
lineage progenitor cells can be monitored by following expression of an 
expression product in transgenic fish. In addition, the regulation of a 
specific gene can be studied in these fish. 

Using zebrafish promoters that drive expression in specific 
tissues, a number of transgenic zebrafish lines can be generated that 
express a reporter protein in each of the major tissues including the 
notochord. the nervous system, the brain, the thymus, and in other tissues 
(see Table 1). Other important lineages for which specific expression can 
be obtained include neutral crest, germ cells, liver, gut, and kidney. 
Additional tissue specific transgenic fish can be generated by usmg 
"enhancer trap" constructs to identify expression sequences in fish. 
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Source of 
Expression Sequences 
GATA-1 
GATA-2 
Tinman 
Rag-1 
Globin 
MEF 
Goosecoid 
SCL-i 
Rbtn-2 
No-tail 
FIk-1 
Eve-1 
Ikaros 
Pdx-1 
Islet- 1 
Shh 
Twist 
Krox20 
BMP4 



Table 1 



Tissues/Cell lineages 
Erythroid progenitor 
Hematopoietic stem cells/CNS 
Heart 

T and B Cells 
Mature red blood cells 
Muscle progenitors 
Dorsal organizer 
Hematopoietic stem cells 
Hematopoietic stem cells 
Notochord 
Vascular endothelia 
Ventral/posterior cells 
Early lymphoid progenitors 
Pancreas 
Motoneuron 

Multi-tissue induction/Left-right symmetry 
Axial mesoderm/Left-right symmetry 
Brain 

Ventt^l mesoderm induction 
S- ''^«WnsC«„p.«„dsTha. Affect Expr«,i.„.,pfeHG«.« 

For many genes, and especially for genes involved in 
developmema, processes, i. would be useiu. .o iden.ify compounds to. 
affec. exp„=ss,„n of U» genes. TTe disclosed .^.genic fish can be 
exposed to compou«ls ,o assess the effect of compound on the 
expression of a gene of interest. For example, .est compounds can be 
admmtstered .0 .ransgenic fish harboring an exogenous consBuc 
conta ning .he expression sequence, of a fish gene of inters, opcabiy 
Lnkcd to a sequence encoding a reporter protein. By comparing the 
expression of the reporter protein in fish exposed to a test compound to 
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those that are not exposed, the effect of the compound on the expression 
of the gene from which the expression sequences are derived can be 
assessed. 

6. Identifying Genes That Affect Expression of Fish Genes 



zebrafish which collectively affect most developmental processes. The 
disclosed transgenic fish can be used in combination with these and other 
mutations to assess the effect of a mutant gene on the expression of a 
gene of interest. For example, mutations can be introduced into strains of 
10 transgenic fish harboring an exogenous construct containing the 

expression sequences of a fish gene of interest operably linked to a 
sequence encoding a reporter protein. By comparing the expression of 
the reporter protein in fish with a mutation to those without the mutation, 
the effect of the mutation on the expression of the gene from which the 
15 expression sequences are derived can be assessed. 

The effect of such mutations on specific developmental processes 
and on the growth and development of specific cell lineages can also be 
assessed using the disclosed transgenic fish expressing a reporter protein 
in specific cell lineages or at specific developmental stages. 
20 7. GeneticaUy Marldng Mutant Fish Genes 

The disclosed transgene constructs can be used to genetically 
mark mutant genes or chromosome regions. For example, in zebrafish, 
recent chemical mutagenesis screens have generated more than one 
thousand different mutants with defects in most developmental processes. 
25 If fish carrying a mutation generated in these screens could be more easily 
identified, a lot of time and labor would be saved. One way to promote 
rapid identification of fish carrying mutations would be the establishment 
of balancer chromosomes that carry markers that can be easily identified 
in living fish. This technology has greatly facilitated the task of 
30 identification and maintenance of mutant stocks in Drosophila (Ashbumer, 
Drosophila, A Laboratory Manual (Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor, N.Y., 1989); Lindsey and Zimm, The Genome of 



5 



Numerous mutants have been generated and characterized in 
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Drosophila melanogaster (Academic Press, San Diego. CA, 1995)) As 
used herein, genetically marking a gene or chromosome region refers to 
genetically linking a reporter gene to the gene or chromosome region 
Genetic linkage between two genetic elements (such as genes) refers to 
the elements being in sufficiently close proximity on a chromosome that 
they do not segregate from each other at random in genetic crosses The 
closer the genetic linkage, the more likely that the two elements will 
segregate together. For genetic maridng, it is preferred that the transgene 
construct segregate with the gene or chromosomal region of interest more 
than 60% of the tune, it is more preferred that the transgene construct 
segregate with the gene or chromosomal region of interest more than 70% 
of the t,me, it is still more preferred that the transgene construct segregate 
with the gene or chromosomal region of interest more than 80% of the 
time, it is still more preferred that the transgene construct segregate with 
the gene or chromosomal region of interest more than 90% of the time 
and it is most preferred that the transgene construct segregate with the ' 
gene or chromosomal region of interest more than 95% of the time. 

Example 1 shows that living transgenic fish carrying insertions of 
a transgene, in which the zebrafish GATA-1 promoter has been ligated to 
the green fluorescent protein (GFP) reporter gene, can be identified by 
simple observation of GFP expression in blood cells. As in Drosophila 
zebrafish chromosomal recombination occurs at a significantly lower me 
during spermatogenesis than it does during oogenesis. Therefore a 
transgene insertion that maps near a chemically induced mutant glne can 
be crossed into the mutant chromosome through oogenesis and will then 
remain linked to the mutation in male fish through many generations. 
This procedure will allow the identification of progeny harboring the 
mutant gene by simple observation of GFP in blood cells. 

In the case of zebrafish, 200 lines carrying the GATA-l/GFP 
transgene (or anotiier reporter construct), randomly inserted throughout 
the zebrafish genome should result in an average of 8 insertions in each of 
the 25 zebrafish chromosomes. This is possible since expression from the 



20 



PCTAJS98/11808 



disclosed constructs is not limited by effects of the site of insertion and 
the site of integration is not limited. The insertion sites can be mapped 
and then crossed through oogenesis into zebrafish lines that carry a 
mutation that maps nearby. Once established, mutant strains that carry 
balancer chromosomes can be maintained in male fish. 

Although it is preferred that mutant genes be genetically marked, 
any gene of interest or any chromosome region can be marked, and the 
maintenance and inheritance of the gene can be monitored, in a similar 
manner. As used herein, an identified mutant gene is a mutant gene that 
is known or that has been identified, in contrast to a mutant gene which 
may be present in an organism but which has not been recognized. 

Genetically mapping of mutant genes or transgenes in fish can be 
performed using established techniques and the principles of genetic 
crosses. Generally, mapping involves determimng the linkage 
relationships between genetic elements by assessing whether, and to what 
extent two or more genetic elements tend to cosegregate in genetic 



8. Identifying Fish That EUive Inherited a Mutant Gene 
Mutant fish in which the mutant gene is marked with an 
20 exogenous construct expressing a reporter protein simplify the 

identification of progeny fish that carry the mutant gene. For example, 
after a cross, progeny fish can be screened for expression of the reporter 
protein. Those that express the reporter protein are very likely to have 
inherited the mutant gene which is genetically linked. Those progeny fish 
25 not expressing the reporter protein can be excluded from further analysis. 
Although recombination during gametogenesis may result in 
segregation of the exogenous construct from the mutant gene, this will 
happen only rarely. Initial screening for fish expressing the reporter 
protein will still ensure that the majority of such progeny fish will carry 
30 the mutant gene. Confirmation of the mutant can be established by 
subsequent direct testing for the mutant gene. 
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9. Wewi^ng and Clcringtegulatonr Sequences from Fish 

The disclosed oonarucB can also be used as "enhancer traps" to 
generate transgenic ftsh that exhibit tissue-specific expression of an 
^pression pr«tact. Transgenic anin«Us cartying enhancer trap constructs 
often exhibit tissue-specific expression patterns due to the effects of 
endoge«,us enhancer elements that lie near the position of imegration. 

Once it is determined that the exogenous construct is operably 
Imked to an enhancer or other regulatory sequence in a fish, the 
regulatoo. elemem can be isolated by „-clomng the transgene construct 
Many general Cloning techniques can be used for this pun>ose A 
P^ferred method of dontag „gubto,y sequences that have become linked 
to a transgene construct in a m is to isolate and cleave genomic DNA 
ftom the fish With a .estricUon enzyme that does not cleave a,e exogenous 
constmct. The resulting fragments can be cloned in vi,ro and scteened 
for the presence of characteristic tmnsgene sequences. A search for 
enhancers in zebraflsh using a transgene construct having only a pranoter 
operably linked to a sequence eroding a protein has generated a 

transgenic line that expresses GFP exchtsively in hatching gland cells. 

A smular procedure can be followed to identify promoters In 
Urn case, a "promoter probe" construct, which lacks any expression 
sequences, is used. Only if the constmc, is inserted in«> a» genome 
downsneam of expression sequences will the expression product encoded 
by the construct be expressed. 

10. Ktotttfying Promoters mid Enhimcer. in Cloned Expression 
^ Sequences 
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■me linked genomic sequences of clones identified as containing 
expression sequences, or any other nucleic acid segmem containing 
expression sequences, can then be character^ to identify potential and 
actual tegulatory sequences. For example, a deledon series of a positive 
Clone can be tested for expression in transgenic fish. Sequences essential 
for expression, or for a pattern of expression, are identified as those 
Which, When deleted from a construct, no longer suppon cxptession or 
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the pattern of expression. The ability to assess the pattern of expression 
of a transgene in fish using the disclosed transgenic fish and methods 
makes it possible to identify the elements in the regulatory sequences of a 
fish gene that are responsible for the pattern of expression. The disclosed 
transgenic fish, since they can be produced routinely and consistently, 
allow meaningful comparison of the expression of different deletion 
constructs in separate fish. 

An example of the power of this capability is described in 
Example 2. Application of this system to the study of the GATA-2 
promoter has led to identification of enhancer regions that facilitate gene 
expression specifically in hematopoietic precursors, the enveloping layer 
(EVL) and the central nervous system (CNS). Through site-directed 
mutagenesis, it has been discovered that the DNA sequence CCCTCCT is 
essential for the neuron-specific activity of the GATA-2 promoter. This 
is described in Example 2. 

11. Isolating Cells Expressing An Expression Product 

Using cell sorting based on the presence of an expression product, 
pure populations of cells expressing a transgene construct can be isolated 
from other cells. Where the transgene construct is expressed in particular 
cell lineages or tissues, this can allow the purification of cells from that 
particular lineage. These cells can be used in a variety of in vitro studies. 
For instance, these pure cell populations can provide mRNA for 
differential display or subtractive screens for identifying genes expressed 
in that cell lineage. Progenitor cells of specific tissue could also be 
isolated. Establishing such cells in tissue culture would allow the growth 
factor needs of these cells to be determined. Such knowledge could be 
used to culture non-transgenic forms of the same cells or related cells in 
other organisms. 

Cell sorting is preferably facilitated by using a construct 
expressing a fluorescent protein or an enzyme producing a fluorescent 
product. This allows fluorescence activated cell sorting (FACS). A 
preferred fluorescent protein for this purpose is the green fluorescent 
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protein. The abUity to generate mnsgenic ft* expressing GFP in a 
t'^ and cell lineage-speciflc man«r for different cell types indicates 
.to transgenic flsh that exp.«s GFP to «her „, Ussues can be 
generated in . straightforwart manner. The disclosed FACS approach 
can therefore be used as a general method for isolatmg pure cell 
populations tron, developing embryos based solely on gene expression 
patterns. This mediod for isolation of specific cell lineages is preferably 
perfonned using constmcts linkmg GFP with the expression se,^ „f 
genes identified as being involved in development. Nmnen^s such genes 
have been or can be idemified as mutants au« affect development. Cells 
«ol.ted m atis mamter should be useM in traospUntation experiments. 

Examples 

Example 1: Tlssue-spedflc Expression and Gennltae lY^nsnUssion 
of a Transgene in ZebraTista. 
In this example. DNA constructs comatotog the putative zebrafish 
expression sequences of GATA-l, e,y«„oid-specific transcription 
ftotor. operatively linked to a sequ««= encoding the green fiuoresce« 
protem (GFP), were microinjected into single-cell zebraflsh embryos. 

GATA-l. an early marker of the erythroid lineage, was mitially 
identtfied through its effects upon globin gene expression (Evam «k1 
Febenfeld. CU 58:877-85 (1989); Tsai « al.. Na,ure 339-446-51 
(1989)). Since then GATA-l has been shown to be a member of a 
multigene femily. Members of this gene family encode transcription 
factors that recognize the DNA core consensus sequence WGATAR 
(SEQ ID NOa8). GATA factors ate key regulators of many important 
developmental processes in vertebrates, particularly hematopoiesis (Orkin 
Bloo, 80:575-81 (1992)). Tl» importance of GATA-l for hematopoiesis' 
was definitively demonstrated in null mutations m mouse (Pevny « al 
A^<m,« 349:257-60 (1991)). In chimeric mice, emtayonic stem cells " 
carrying a m.11 mutation in GATA-l, created via homologous 
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recombination, contributed to all non-hematopoietic tissues tested and to a 
white blood cell fraction, but failed to give rise to mature red blood cells. 

In zebrafish, GATA-1 expression is restricted to erythroid 
progenitor cells that initially occupy a ventral extra-embryonic position, 
5 similar to the situation found in other vertebrates (Detrich et al., Proc 
Natl Acad Sci USA 92:10713-7 (1995)). As development proceeds, 
these cells enter the zebrafish embryo and form a distinct structure known 
as the hematopoietic intermediate cell mass (ICM). 

Vertebrate hematopoiesis is a complex process that proceeds in 
10 distinct phases, at various anatomic sites, during development (Zon, 

Blood 86:2876-91 (1995)). Although stodies on in vitro model systems 
have generated some insight into hematopoietic development (Cumano et 
at.. Cell 86:907-16 (1996); Kennedy et al.. Nature 386:488-493 (1997); 
Medvinsky and Dzierzak. Cell 86:897-906 (1996); Nakano et al.. Science 
15 272:722-4 (1996)), the origin of hematopoietic progenitor cells during 
vertebrate embryogenesis is still controversial. Therefore, an in vivo 
model should be useful to determine precisely the cellular and molecular 
mechanisms involved in hematopoietic development: Such a model could 
also be used to identify compounds and genes that affect hematopoiesis. 
20 In mammals, since embryogenesis occurs internally, it is difficult to 
carefiiUy observe hematopoietic processes. 

Zebrafish have a number of features that facilitate the smdy of 
vertebrate hematopoiesis. Because development is external and embryos 
are nearly transparent, the migration of labeled hematopoietic cells can be 
25 easily monitored. In addition, many mutants that are defective in 
hematopoietic development have been generated (Ransom et al.. 
Development 123:311-319 (1996); Weinstein et al.. Development 123:303- 
309 (1996)). Zebrafish embryos that significantiy lack circulating blood 
can survive for several days, so downstream effects of mutations upon 
30 gene expression deleterious to embryonic hematopoietic development can 
be characterized. Since the cellular processes and molecular regulation of 
hematopoiesis are generally conserved throughout vertebrate evolution. 
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"KChamsim involved in mammalian hematopoiesis, 

Oonlng and sequencing of GATA-1 genomic DNA 
A zebrafisl, genomic pl«ge library was screened with a 3% 
»d.olabeled probe conmimng a region of zebraflsh GATA-a cDNA mt 
encodes a conserved zinc finger. A number of po«Uve clones we« 
■OenUfied. TT,e insert in U,ese Cones were c«. wid, various «s«cUon 

The resulting fragments were subcloned in«, pBhrescript I, 
KS(-> and sequenced. Based on DNA sequence analysis, two phage 
clones were shown ,„ contain zebraflsh GATA-1 sequences. He cDNA 
jquen^ Of zebraflsh GATA-1 is described by Detrich « a,, P^. r^a., 
Acai. Sc. USA 92:107,3 (1995). Nucleotide sequence of fl« GATA-l 
promoter region is shown in SEQ ID NO:26. 
Plasmid constructs 

^~Gl-(Bg.)-GM2 was generated by ligatinga^ifi^ 
OR. spotter gene (GM2) to a 5.4 kb EcoRTm.Ul flagment d,at contains 
putanve zeb^flsh GATA-1 expression sequences, tha, is, u,e 5- flanking 
sequences upstream of the major GATA-1 transcription start site GIV12 
wild type GFP and a 3^ ^coI^EcoJU fragment derived from a 
GFP vanant. m2, that emits approximately 30 fold greater fluorescence 
ftan does the wild type GFP under suuKlard HTC conditions (Cormack « 
al Gene 173:33-8 (1996)). TUis constnKt is illustrated as constn„. (1) 
in Figure 2. ^ ^ 
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To tsolate expression sequences in the 5' untranslated region of 
GATA-l a 5.6 kb DNA flagment was amplified by the polymerase chain 
ryon (PCR) f^m a GATA-1 genomic sut^lone using a T7 primer 
»h,ch .s complementary to the vector sequence, and a specific primer. 
Ohgo (1). that .s complementary to the cDNA sequence just 5' of the 
GATA-1 translation star,. The GATA-1 specific primer contained a 
BamHI site to facUitate subsequent cloning. Tl« PCR reaction was 
performed using Expand- I.ng Template PCR System (Boehringer 
Mannheim) for 30 cycles (94»C. 30 seconds; 60»C, 30 seconds; 68"C 5 
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minutes). After digestion with BamHI and Xhol, this 5.6 kb DNA 
fragment was gel purified and ligated to DNA encoding the modified 
GFP, resulting in construct G1-GM2 (construct (2) in Figure 2). The 
construct Gl-(5/3)-GM2 was generated by ligating an additional 4 kb of 
5 GATA-1 genomic sequences, which contains GATA-1 intron and exon 
sequences, to the 3' end (following the polyadenylation signal) of the 
reporter gene in construct G1-GM2. This construct is Ulustrated as 
construct (3) in Figure 2. 

Fish and Microinjection 
10 Wild type zebrafish embryos were used for all microinjections. 

The zebrafish were origmally obtained from pet shops (Gulp et al., Proc 
Natl Acad Sci USA 88:7953-7 (1991)). Fish were maintained on reverse 
osmosis-purified water to which Instant Ocean (Aquarium iSystems, 
Mentor, OH.) was added (50 mg/1). Plasmid DNA G1-GM2 was 
15 linearized using restriction enzyme Aatll (which cuts in the vector 
backbone), while plasmid DNA Gl-(5/3)-GM2 was excised from the 
vector by digestion with restriction enzyme Sad, and separated using a 
low melting agarose gel. DNA fragments were cleaned using 
GENECLEAN II Kit (BiolOl Inc.) and resuspended in 5 mM Tris, 0.5 
20 raM EDTA, 0.1 M KCl at a final concentration of 50 ttgiml prior to 
microinjection. Single cell embryos were prepared and injected as 
described by Gulp et aL, Proc Natl Acad Sci USA 88:7953-7 (1991), 
except that tetramethyl-rhodamine dextran was included as an injection 
control. This mvolved collecting newly fertilized eggs, dechorionating 
25 the eggs with pronase (used at 0.5 mg/ml), and injecting DNA. Injection 
with each construct was done independently 5 to 10 times and the data 
obtained were pooled. 

Fluorescent microscopic observation and imaging 
Embryos and adult fish were anesthetized using tricaine (Sigma 
30 A-5040) as described previously (Westerfield, The Zebrafish Book 

(University of Oregon Press, 1995)) and examined under a FITC filter on 
a Zeiss microscope equipped with a video camera. Images of circulating 
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blood cells were p^dueed by priming out todividual ftames of reconied 
vdeos. omer picmres of fluorescent embryos were generated by 
superimposing a bright fleld image on a fluo«scent image using Adobe 
Phott»hop software. One month old fish wem ««sthetized and then 
rapidly embedded in OCT. Sections of 60 ^ were out using a ctyostat 
and were immediately observed by fluorescence microscopy. 
Identification of germline transgenic fish by PGR 
DNA isolauon, internal comrol primets and PCR conditions wete 
*e same as described by Lin « al. DevBiol 161:77-83 (1994)) Briefly 
DNA was extnicted from pools of 40 to several hundred dechorionated ' 
embtyos (obtained fmm mating a single pair of fish) a. 16 to 24 hours of 
development by vortexing for I mumte in a buffer containing 4 M 
g^amdium isodnocyanate. 0.25 mM sodium citrate (pH 7.0), and 0 5% 
S^kosyl. 0.1 M ^-metcaptoethanol. The sample was exttacted once with 
Phenolxhloroform: isoamyl alcohol (25:24:1) and total nucleic acid was 
precipitated by the addition of 3 volmnes of ethanol and 1/10 volume 
sodium acetate (3 M, pH 5.5). IIb p.u« cashed once m 70% 
ethanol and dissolved in IX TE (pH 8.0). 

Approximately 0.5 ;.g of DNA was used in a PCR reaction 
containing 20 mM Tris <pH 8.3). 1.5 mM MgO,, 25 mM KCl, 100 
*.g/ml gelatin, 20 pmole each PCR primer. 50 ;.M each dNWs 2 5 U 
Taq DNA polymerase (Pharmacia). reaction was cairied out at 94-C 
for 2.5 minutes for 30 cycles with a 5 minute miUal 94"C demtmtation 
2' and a 7 minute fmal 72"C elongation step. Specific primers. Oligos 
(2) and (3). that were used to det^ GFP. generated a 267 bp p^duct A 
patr Of .ntemal control primers homologous to sequences of me zebrafish 
homeobox gene. ZF-21 (NJolstad « al.. FEBS Leners 230:25-30 (1988)) 
was included in each reaction. This pair of primers should genetaie a ' 
PCR product of 475 bp for all PCR reaction, using abmfish DNA. 
Preparation of embryonic cells and flow cytometry 
Embryos were disrupted in Holfeieter's solution using a 1 5 ml 
peUet pesae (Komes Glass, OEM749521-1590). Cells were coUected by 
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centrifugation (400 g, 5 minutes). After digestion with IX 
Trypsin/EDTA for 15 minutes at 32°C. the cells were washed twice with 
phosphate buffered saline (PBS) and filtered through a 40 micron nylon 
mesh. Fluorescence activated cell sortmg (FACS) was performed under 
standard FITC conditions. 

cDNA synthesis and PGR 

Total RNA was extracted from FACS purified cells using the 
RNA isolation kit, TRIZoL (BiolOl). Reverse transcription and PGR 
(RT-PCR) were performed using the Access RT-FCR System from 
) Promega (Catalog # A1250). Specific primers, Oligos (4) and (5). used 
to detect the zebrafish GATA-1 cDNA, generated a 410 bp product. 
Oligonucleotides 

(1) 5'-CCGGATCCTGCAAGTGTAGTATTGAA-3' (GATA-1, 

promoter antisense; SEQ ID NO:l); 
5 (2) 5'-AATGTATCAATCATGGCAGAC-3' (GM2 sense; SEQ ID 

N0:2); 

(3) 5'-TGTATAGTTCATCCATGCCATGTG-3' (GM2 antisense; 

SEQ ID NO:3); 

(4) 5'-ATGAACCTTTCTACTCAAGCT-3' (GATA-1, cDNA 

20 sense; SEQ ID NO:4) 

(5) 5'-GCTGCTTCCACTTCCACTCAT-3' (GATA-1. cDNA 

antisense; SEQ ID NO:5) 

Whole-mount RNA in situ hybridization 
Sense and antisense digoxigenin-labeled RNA probes were 
25 generated from a GATA-1 genomic subclone containing the second and 
third exon coding sequence using a DIG/GeniusTM 4 RNA Labeling Kit 
(SP6/T7) (Boehinger Mannheim). RNA in situ hybridizations were 
performed as described (Westerfield. The Zebrafish Book (University of 
Oregon Press, 1995)). 
30 Genomic structure of the zebrafish GATA-1 

Two clones containing zebrafish GATA-1 sequences were isolated 
from a lambda phage zebrafish genomic library as described above. 
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R«mctio„ enzyme mapping ^ieated a^a^t^o overiapping clones 
contamed app^ximattly 35 kb of .he OATA-1 locus. To define fte 
P«.nK,.er of U» zebrafl* GATA-. gene, Tanscription initiation .i.es for 

mouse human and oU,er species, multiple transcription imtiation sites 
were tdentified. A major transcription initiation site was mapped 187 
bases upsnsam of the translation start. 

and ch r""""" — 
c*. icen suggested Uku the mtton-exon junction se<n«™es of ti.is gene 
are l*ely to be conserved ti™,ghout venebrates. Oligonucleotide 
pnmea flankii^ p«e«ia, GATA-1 introns wete designed and used to 
-juence ti,e zebrafish genomic clones. Se<,tence analysis ..vealed that 
«^ zebrafish GATA-1 gene consists of five exons and four hurons which 
..e wttiun a 6.5 .b genomic region (Figure 1), Alti,ough ^ ex„„-mtr». 
number and junction se,ue«es are well conserved between zebraflsh and 
oti»r vertebrates, tite zebtafish GATA-l totrons are smaller ti«, in other 
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3™«^«P'essionofGBPdrive„by,heGATA.lpromo,er 
m zebrafish embryos 

Based on the zebrafish GATA-1 genomic sttucture. Uu« GFP 
reporter gene constructs were generated (FigutB 2). Constntct 
G.^(Bgl)-OM2 was generated by ligation of a modified GFP reporter gene 
(GM2) to a 5.4 .b ^ ^ ^. ^ ^» 

«quences upstream of the major GATA-1 transcription start site 
^0. G1^M2 co^ained a 5.6 «, ..gi„„ „p3„eam of ti,e tnmslation 
«ar^ Of GATA-1. .bird consuuct. Gl.(5/3,-GM2. was generated by 
l««mg an additional 4 Icb of OATA-1 genomic sequences, which conutin 

gTgmT r " 

G -GM2. Each construct was micoinjected into the cytoplasm of single 
cell zebrafish embryos. GFP reporter gene exptession m the embryos 
was examined a. a number of distinct developmental stages by 
fluorescence microscopy. 
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GFP expression was observed in embryos injected with either 
construct G1-GM2 or construct Gl-(5/3)-GM2 as early as 80% epiboly, 
approximately 8 hours post fertilization (pf). At that time, GFP positive 
cells were restricted to the ventral region of the injected embryos. At 16 
5 hours pf , GFP expression was clearly visible in the developing 
intermediate cell mass (ICM), the earliest hematopoietic tissue in 
zebrafish. After 24 hours pf, GFP positive cells were observed in 
circulating blood and could be continuously observed in circulating blood 
for several months. During the first five days pf, examination of 
10 circulating blood revealed two distmct cell populations with different 
levels of GFP expression. One ceU type was larger and brighter; the 
other smaller and less bright. No significant difference in GFP 
expression levels was detected between embryos injected with either 
construct G1-GM2 or Gl-(5/3)-GM2. However, injection of construct 
15 Gl-(Bgl)-GM2 yielded very weak GFP expression in developing embryos. 
This result indicated that either the GATA-1 transcription initiation site 
was removed by BgUI restriction digestion, or that the 5' untranslated 
region of zebrafish GATA-1 is required for high level tissue specific 
expression of GFP. It is not surprising that a construct lacking the 5' 
20 untranslated region of GATA-1 did not generate much GFP expression in 
microinjected embryos. These regions are often needed for transcript 
stability. At times, these regions also contain binding sites for regulators 
of gene expression. 

At least 75% of the embryos injected with G1-GM2 or 
25 Gl-(5/3)-GM2 construct showed some degree of ICM specific GFP 

expression (Table 2). The number of GFP positive cells in the ICM or in 
circulation ranged from a single cell to a few hundred cells. Less than 
7% of these embryos showed GFP expression in non-hematopoietic 
tissues, usually limited to fewer than ten cells per embryo. Non-specific 
30 expression of GFP was usually observed in the notochord, muscle, and 
enveloping cell layers, and was limited to no more than 10 cells per 
embryo. These observations indicated that a genomic GATA-1 fragment 
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extending approximately 5.6 kb upstream from the GATA-1 translation 
s^ s,. ligated to GFP sufnced to recapitulate the embryonic patten, of 
OAiA-l expression in zebrafish. 

Table 2 



Constructs 



No. 
observed 
embryos 



No. embryos No. embryos No. embryos 
with GFP with strong with non- 
GFP specific 

expression in expression 
ICM (%r GFP (%) 



ICM (%) 



274(81.5%) 
187 (75.4%) 
0(0%) 



177 (52.7%) 15 (4.5%) 
150(60.5%) 16(6.5%) 
0(0%) 19(5.1%) 



G1-GM2 

Gl-GM2(5/3) 248 
Gl(Bgm)-GM2 370 
'Strong GFP expression means that each embryo haVmore than TgreeJ 
fluorescent cells in the ICM. 

GFP expression in germline GATA-l/GFP transgenic zebrafish 
Mtcroinjected zebrafish embryos were raised to sexual mamrity 
and mated. Progeny were tested by PGR to determine the frequency of 
germhne transmission of the GATA-l/GFP transgene. Nine of six 
hundred and seventy two founder fish have transmitted GFP to the Fl 
generation. Examination of these fish by fluorescence microscopy 
revealed that seven of eight lines expressed GFP in the ICM and in 
Circulating blood cells. GFP expression patterns in the ICM were 
consistent with the RNA suu hybridization patterns previously observed 
tor GATA-1 mRNA expression in zebrafish (Detrich et al., Proc Nail 
Aca, S.i USA 92.10713-7 (1995)). In the two lines where F2 transgenic 
fish have been obtained. GFP expression in blood cells was observed in 
50% of the progeny when a transgenic F2 was mated to a non-transgenic 
fish. This indicated that GFP was transmitted to progeny in a Mendelian 
fashion. Southern blot analysis showed that GFP transgene insertions 
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occurred at different sites in these two lines. In one line, transgenic fish 
apparently carry 4 copies of the transgene and in the other line, 7 copies. 

Blood ceUs were collected from 48 hour transgenic fish by heart 
puncture and a blood smear was observed by fluorescence microscopy. 
Two distinct populations of fluorescent cells were observed in these 
smears. As in the circulation of embryos that transiently express GFP, 
one cell population was observed that was large and bright and another 
that was smaller and less bright. Although the blood cells collected firom 
adult transgenic zebrafish showed some variability in fluorescence 
intensity, they appeared to have uniform size. Blood cells collected from 
non-transgenic fish showed no fluorescence. 

In two day old transgenic zebrafish. weak GFP expression was 
observed in the heart. GFP expression was also observed in the eyes and, 
in three of seven transgenic lines, in some neurons of the spinal cord. 
Expression in the eyes peaked between 30 and 48 hours pf and became 
extremely weak by day 4. It is thought that expression of GFP in eyes 
and neurons may replicate the authentic GATA-1 expression pattern. 

Examination of GFP expression in tissues of one month old fish 
showed that the head kidney contained a large number of fluorescent 
cells. This result suggests that the kidney is the site of adult 
erythropoiesis in zebrafish. It has been reported that GATA-1 is 
expressed in the testes of mice. Expression of GFP was not found in 
testes dissected from adult fish. It is possible that the disclosed GATA-1 
transgene constructs lack an enhancer required for testis expression of 
25 GATA-1. Other tissues including brain, muscle and liver had no 
detectable level of GFP expression. 

FACS analysis of GATA-l/GFP transgenic fish 
GFP expression in GATA-l/GFP transgenic fish allowed isolation 
of a pure population of the earliest erythroid progenitor cells for in vitro 
30 smdies by fluorescence activated cell sorting. Fl transgenic embryos 

were collected at the onset of GFP expression and cell suspensions were 
prepared. Approximately 3.6% of the cell populations of whole 
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«>-gemc fish were fl„o^e«e posiUves « Compaq, u. 0.12% ta the 
no„-ua..ge„ic oo„m,b. Based on U» ™„her of emb,yos used, FACS 
a»a.ys„ suggested .pp„^^, ^ 

progenitor ceUs per embryo at 14 horns pf. 

we. deter^^ed by RT-PCR -n. results indicated that d.se ce„s were 
highly enriched for GATA-1 mRNA. 

^'>'^"'»P«*'-«=«Pr«sion was observed in living embryos 
durmg early development. Fh».«cen. circulating blood cell! we„ 

«ai be observed m two month old fish. Germline transgemc fish 
obouned ftom the injected fomuiers continued to express GFP in eiydiroid 
ceUs m the Fl and F2 generations. The GFP expression patterns to 
transgenic fish we« consistent with the RNA s^u hybndi««ion pattern 
generated for GATA-1 mRNA expression. These tnmsgenic m Iwed 
-lation. hy fluorescence activated ceU sorting, the earliest eiythroid 

omrlT'^'"'^"^'^'"'^'''- "''^^—con.ainh^ 
other zebrafish promoter, and GFP. it will be possible to generate 

"ansgenic fl* that allow contimious visualization of the origin and 
m.gm«on of any lineage specific progenitor ceils in a living embryo 
^ We results described in this example i^iicate dm monitoring 
GFP expression can be a more sensitive mediod than RNA sim 
—n by Which to deteiminegeneexpression patterns. For instance, 
.n the disclosed GATA-l/OFP transgeMc fish, GFP expi^ssion in 
cu-iating blood allowed two types of celis to be distinguished. One ceU 
•ype was larger and brighter; the other smaller and less bright There 
were fewer Cthe iarger, brighter ceU type. These cells are believed to 
be eiythroid precursors while the more abund^it, smaller cells are 
believed to be fully differentUtederyflm^ytes. Prelimmary cell 
«««pUni.ation experiments wiUt embryonic Wood cells have shown U«t 
ftey conum a cell population that has long-term proliferaUon capacity 
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In two day old transgenic zebrafish, GFP expression was observed 
in the heart. In adult transgenic zebrafish, GFP expression was observed 
in the kidney. By histological methods, it has been shown that the heart 
endocardium is a transitional site for hematopoiesis in embryonic 
5 zebrafish and that the kidney is the site of adult hematopoiesis 

(Al-Adhami and Kunz, Develop. Growth and Differ. 19:171-179 (1977)). 
The results in GATA-l/GFP transgenic fish support these observations. 

The GFP expression seen in the eyes and neurons of embryonic 
transgenic fish may be due to a lack of a transcriptional silencer in the 
10 transgene constructs. It seems unlikely that the GFP expression in the 
eyes is due to positional effects caused by the sites of insertion since all 
seven transgenic lines have GFP expression in embryonic fish eyes. 

Using fluorescence activated cell sorting, pure populations of 
hematopoietic progenitor cells were isolated from the ICM of transgenic 
15 zebrafish. Since approximately lO"' cells can be sorted per hour. 10^ to 
10^ purified ICM cells can be obtained in a few hours. These cells, 
which are derived from the earliest site of hematopoiesis in zebrafish, can 
be used in a variety of in vitro studies. For instance, these pure cell 
populations can provide mRNA for differential display or subtractive 
20 screens for identifying novel hematopoietic genes. Erythroid precursors 
obtamed from the ICM might also be established in tissue culture. This 
would allow the growth factor needs of these ceUs to be determined. 

The approach to obtaining and smdying transgene expression in 
erythroid cells described above is generally applicable to the study of any 
25 developmentally regulated process. This approach can also be applied to 
the identification of cis-acting promoter elements that are required for 
tissue specific gene expression (see Example 2). The analysis of 
promoter activity in a whole animal is desirable since dynamic temporal 
and spatial changes in a cellular microenvironment can be only poorly 
30 mimicked in vitro. The ease of generating and maintaining a large 
number of transgenic zebrafish lines makes obtaining statistically 
significant results practical. Finally, transgenic zebrafish that express 
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OFP m spe<=,fic „»„es provide usefi.. n«*=rs for idenritytag n,u.«i„„s 
*a affect U,e« .toe. ^ ^ 

embryological methods available for zebrafisl,. t.a,,.genie zebrafish 
exhibiting tissue-specific GFP exp,«sion is a ve,y valuable tool for 
5 dissecting developmental processes. 

Example 2: MentiHcafion of E.U.ancers in GATA-2 Expr«8ion 
Sequences. 

A large number of studies have shown that neuional cell 
10 detenmnation in invertebrates occurs in prog.«sive waves that ate 
regulated by sequential cascades of transcription factors. Much less is 
taown about this process in vertebnttes. I. was realized that an integrated 
approach combining embtyologicai, genetic and molecuhtr methods such 
as that used to smdy neurogenesis in DrosopHUa (Ghysen e, al.. Genes 6i 
15 De. 7:723-33 (1993)). would facilitate the identification of fte molecular 
mechamsms involved in specifytog neuronal fates in vertebrates TTk 
following is an example of identification of cis-acting sequences that 

control neuton-specffic gene expression in a vertebrate. Such 
identification is an initial step toward umaveling similar cascades in a 
20 vertebrate. 



25 



Transcription factors bind to cis-acUng DNA s^„c»es 
(sometimes ,efer„=d to as tesponse sequences) to ,^gula«= transcription 
Often these transcription factt,rs at« members of multigene families that 
have overlapping, but distinct, expression patterns and factions The 
transcription factor GATA-2 is a member of such a gene family 
(Yamamoto««/.. &««Oev 4: 1650-62 (1990)). Each member of fl« 
GATA gene family is characterized by its ability to bi«l ,o cis-acting 
DNA elements with the consemus core seque«:e WGATAR (Orkin 
Blooa 80:575-81 (1992); SEQ ID NO:18). All protem products of 'the 
GATA family contain two copies of a highly conserved strucmral motif 
commonly Imown ^ , „^ ^ • 

(Martm and Oridn. Genes Dev 4:1886-98 (1994)). Six members of the 
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GATA family have been identified in vertebrates (Orkin, Blood 80:575-81 
(1992), Orkin, Curr Opin Cell Biol 7:870-7 (1995)). Pannier, another 
member of the GATA gene family, is expressed in Drosophila neuronal 
precursors and inhibits expression of achaete-scute, a gene complex that 
plays a critical role in neurogenesis in Drosophila (Ramain et al.. 
Development 119:1277-91 (1993)). 

In chicken and mouse, the transcription factor GATA-2 is 
expressed in hematopoietic precursors, immamre erythroid cells, 
proliferating mast cells, the central nervous system (CNS), and 
sympathetic neurons (Yamamoto et al.. Genes & Dev 4:1650-62 (1990), 
Orkin, Blood 80:575-81 (1992). Jippo et al.. Blood 87:993-8 (1996)). 
Studies in zebrafish (Detrich et al., Proc Natl Acad Sci USA 92:10713- 
7 (1995)) and Xenopus (Zon et al., Proc Natl Acad Sci USA 88:19642- 
6 (1991), Kelley et al., Dev Biol 165:193-205 (1994)) have also shown 
that GATA-2 expression is restricted to hematopoietic tissues and the 
CNS. Homozygous null mutants, created in mouse via homologous 
recombination, have profound deficits in all hematopoietic lineages (Tsai 
et al.. Nature 371:221-6 (1994)). The role played by GATA-2 in 
neuronal tissue of these mice has not been carefully examined, perhaps 
because the embryos die before day E11.5. Analysis of GATA-2 
expression in chick embryonic neuronal tissue after notochord ablation has 
suggested that GATA-2 plays a role in specifying a neurotransmitter 
phenotype (Groves et al.. Development 121:887-901 (1995)). In addition, 
GATA factors are required for activity of the neuron-specific enhancer of 
; the gonadotropin-releasing hormone gene (Lawson et al., Mol Cell Biol 
16:3596-605 (1996)). 

The effects of various hematopoietic growth factors on GATA-2 
expression has been carefully smdied in tissue culmre systems (Weiss et 
al., Exp Hematol 23:99-107 (1995)) and some growth factors have been 
0 shown to have dramatic effects on early embryonic GATA-2 expression 
(Walmsley et al.. Development 120:2519-29 (1994), Maeno et al.. Blood 
88:1965-72 (1996)). In addition, nuclear translocation of a maternally 
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CCAAT binding „anscripti» fa«» been *ow„ ,» be 
neces^o' for fte onset of GATA-2 wn^rfpUon a. fte mid-btoula 
transmon in Xenopus (B«wer « a/., £>^ y ,4,757.66 (1995)) 
However, prior » a,e disclosed work. no*ing wa» taown abou. tt,e 
mechanisms *at control neuron-specifc exp„=ssion of Ms gen. 

aoning and sequencing of 5- part of GATA.2 genomic DNA 
A zebransh genomic phage library was screened with the 
cor^erved zinc finger domain o, zebrafish GATA-2 cDNA radiolabeled 
w.th P. Two positive clones. XGATA.21 and XGATA.22 were 
.^ntified. Restriction ftagments of XGATA-21 were subcloned into 
pBlu^crtpt n KS(-). DNA sequence of the resulting Cones was obtained 
ftom ^807 to +2605 telative to the GATA-2 translation star. 

N0.27. Unless „u,erwise indicated, positions within the GATA-2 clones 

K 1 was amplified by fl» polymerase chain reaction (PCR) using 
Expand I^»g Template PCR System (Boehringer Mannheim) for 25 
cycles (94»C .30 seconds: 68-C. 8 minutes). Primers used were a T7 
pmner and a primer specific for sequences 5' .0 the GATA-2 translation 
^ nt. (5'-ATGGATCCTCAAGTGTCCGCGCTTAGAA.3-; SEQ ID 
N0:19). The GATA-2 specific primer contained a Jtomw site to 
faclitate subsequent cloning. TT^ PCR p™,^, (pj, „^ ,,„^ ^ ^ 
SmallBamHI sites of pBluescript H KS(-). 
Plasmid constructs 

The 7.3 kb DNA ftagment containing the putative GATA-2 
TOon sequences (PI) was ligated to a modified GFP reporter gene 
(GM2. described above), resulting to construct P1.GM2 (Figure 3) 
Based on PI-GM2, constructs comaming successive 5' deletions m the 
region upstream of me transcription star, site were generated using the 
r^triction sites P.,. Sacl, Aa.,.. Oa, and Seal in this upstream tegion 
(Figure 3). Constructs nsP5<IM2 and nsP6-GM2 were generated by 
•igating the ,116 bp fragment cont^ng the GATA-2 neuron-specific 
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enhancer from -4807 to -3690 to P5-GM2 and P6-GM2, respectively 
(Figure 4). The same fragment containing the neuron-specific enhancer 
was also ligated to a 243 bp Sphl/BamHI fragment of the Xenopus 
elongation factor la (EF la) minimal promoter that had previously been 
5 ligated to the GM2 gene, resulting in construct ns-XS-GM2 (Figure 4). 
The EF la minimal promoter has been described in Johnson and Krieg, 
Gene 147:223-6 (1994). 

PGR mapping of neuron-specific enhancer 
PGR technology was exploited to create a deletion series within 
10 the 1116 bp neuron-specific enhancer using nsP5-GM2 as a template. A 
total of 10 specific 22-mer primers were synthesized. These included 
ns4647. ns4493, ns4292, ns4092, ns3990. ns3872, ns3851, ns3831, 
ns380O and ns3789, in which the numbers refer to the positions of their 5' 
end base in the GATA-2 genomic sequence. A T7 primer was also used 
15 in the PGR reactions. The amplified fragments all contained the GM2 
gene and SV40 polyadenylation signal in addition to the GATA-2 
expression sequences. PGR reactions were performed using Expand™ 
Long Template PGR System (Boehringer Mannheim) for 25 cycles (94°C, 
30 seconds; 55"G, 30 seconds; 72°G, 2 munites). The PGR products 
20 were purified with GENEGLEAN II Kit (Bio 101 Inc.) and subsequently 
used for microinjection. 

After a 31 bp neural-specific enhancer was identified, five 
additional primers, each containing 2 or 3 mutant bases relative to the 
wild type enhancer sequence, were designed. These primers are (the 
25 mutant bases are underlined): 





ns3831 


5' 


- TCTGCGCCGCTTTCTGCCCCCTCCTGCCCTCTT - 


3' 


(SEQ 


ID 




NO:20) 
ns3831Ml 


5' 


- TCTGCGAAGCTTTCTGCCCCCTCCTGCCCTCTT - 


3' 


(SEQ 


ID 


30 


N0:21) 
ns3831M2 


5' 


-TCTGCGCCGCTTTCTGi^ACCCTCCTGCCCTCTT- 


•3' 


(SEQ 


ID 




NO: 22) 
na38 31M3 


5' 


' -TCTGCGCCGCTTTCTGCCAACTCCTGCCCTCTT- 


-3' 


(SEQ 


ID 




NO: 23) 
ns3831M4 


5 


' - TCTGCGCCGCTTTCTGCCCCAAACTGCCCTCTT 


-3' 


(SEQ 


ID 


35 


N0:24> 
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ns383lM5 S'-TCTOCGCCGCTTTC.X3CCCCCTCSTGCCCTCTT-3' (SEQ ID 

These primers were used in conjunction with the T7 primer for PGR 
amplification of the target sequence using the nsP5-GM2 as the template 
PGR conditions were identical to those described above. 
Microinjection of zebrafish 

Wild-type zebrafish were used for all microinjections. Plasmid 
DNA was linearized using single-cut restriction sites in the vector 
backbone, purified using GENECLEAN II Kit (Bio 101 Inc ) and 
resuspended in 5 mM Tris. 0.5 mM EDTA, 0.1 M KCl at a final 
concentration of 100 ^g/ml. Single cell embiyos were microinjected as 
descnbed above. Each construct was injected independently 2 to 5 times 
and the data obtained were pooled. 

Fluorescent microscopic observation 

Embryos were anesthetized usmg tricaine as described above and 
exammed under a FITC filter on a Zeiss microscope equipped with a 
video camera. Picmres showing GFP positive cells in living embryos 
were generated by superimposing a bright field image on a fluorescent 
miage using Adobe Photoshop software. 

Whole-mount RNA in situ hybridization 

Sense and antisense digoxigenin-labeled RNA probes were 
generated from a GATA-2 cDNA subclone containing a 1 kb fragment of 
the 5' coding sequence using DIG/Genius™ 4 RNA Labeling Kit 
(SP6/T7) (Boehinger Mannheim). RNA in situ hybridizations were 
performed as described by Westerfield iThe Zebrafish Book (University of 
Oregon Press, 1995)). 

Isolation of GATA-2 genomic DNA 

Two GATA-2 positive phage clones, XGATA-21 and XGATA-22 
were identified as described above. Preliminary restriction analysis 
suggested that XGATA-21 contained a large region upstream of the 
translation start codon. 7412 bp of this clone was sequenced from -4807 
to +2605 relative to the translation start site. The putative GATA-2 
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expression sequences (PI) containing approximately 7.3 kb upstream of 
the translation start site from the XGATA-21 was subcloned into a 
plasmid vector for expression studies. 

Expression pattern of a modified GFP gene driven by the 
5 putative GATA-2 promoter in zebrafish embryos 

The construct P1-GM2 was generated by ligation of a modified 
GFP reporter gene (GM2) to PI (Figure 3). This construct was injected 
into the cytoplasm of single cell zebrafish embryos and GFP expression in 
the microinjected embryos was examined at a number of distinct 
10 developmental stages by fluorescence microscopy. 

GFP expression was initially observed by fluorescence microscopy 
at the 4000 cell stage at about 4 hours post-injection (pi). At the dorsal 
shield stage (6 hours pi), GFP expression was observed throughout the 
prospective ventral mesoderm and ectoderm but expression in the dorsal 
15 shield was extremely rare. At 16 hours pi, GFP expression was observed 
in the developing intermediate cell mass (ICM), the early hematopoietic 
tissue of zebrafish. In addition, GFP expression could be seen in 
superficial EVL cells at 4 hours pi. Expresision in the EVL peaked 
between 24 and 48 hours pi and became extremely weak by day 7. GFP 
20 expression in neiux)ns, including extended axons, was first observed at 30 
hours pi and was maintained at high levels through at least day 8. 

Embryos injected with the P1-GM2 construct expressed GFP m a 
manner restricted to hematopoietic cells, EVL cells, and the CNS. The 
GFP expression patterns in gastrulating embryos, in the blood progenitor 
25 cells, and in neurons were consistent with the RNA in situ hybridization 
patterns previously generated for GATA-2 mRNA expression in zebrafish 
(Detrich et al., Proc Natl Acad Sci USA 92:10713-7 (1995)). 
However, GATA-2 expression in EVL has not been detected by RNA in 
situ hybridizations. 

30 More than 95% of the embryos injected with P1-GM2 had tissue 

specific GFP expression (Table 3). About 5% of these embryos had non- 
specific GFP expression, limited to fewer than five cells per embryo. 
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These observations indicated that the DNA fragment extendmg 
approximately 7.3 kb upstream from the GATA-2 translation start site 
sufficed to correctly generate the embryonic tissue-specific pattern of 
GATA-2 gene expression. 

Table 3 



Construct 



P1-GM2 
P2-GM2 
P3-GM2 
P4-GM2 
P5-GM2 
P6-GM2 



No. 
embryos 
observed 



141 
198 
303 
143 
139 
138 



No. 
embryos 

with 
expression 



135 
177 
291 
126 
90 



No. embryos 

with 
circulating 

blood 
expression 
(%) 

3 (2.13) 
32 (15.7) 

29 (9.6) 
21 (14.7) 
16(11.5) 

2 (1.4) 



No. 
embryos 

with 
neuronal 
expression 
(%) 

106 (75.2) 

136 (68.7) 

0(0) 

0(0) 

0(0) 

0(0) 



No. embryos 
with EVL 
expression 
(%) 



130 (92.2) 
175 (88.4) 
277 (91.4) 
118 (82.5) 
20 (14.4) 
11 (8.0) 



Gross mapping of tissue-spedfic enhancers 

To identify the portions of the GATA-2 expression sequences that 
are responsible for regulating tissue specific gene expression, several 
constructs containing deletions in die promoter were generated (Figure 3) 
Naturally occurring restriction sites were used to create a series of gross 
deletions in the expression sequence region. Each construct was 
individually microinjected mto single cell embryos. The developing 
embiyos were observed by fluorescence microscopy at regular intervals 
for several days. 

Embryos injected with P2.GM2, which contains GATA-2 
sequences from -4807 to -f-l. expressed GFP in a mamier similar to 
embryos injected with the original construct. P1.GM2 (Table 3). At 48 
hr pi, GFP expression was observed in circulating blood cells, the CNS 
and the EVL. However, careful observation of the injected embryos at 
16 hr pi revealed that expression in the posterior end of the ICM was 
nearly abolished. This suggested that an enhancer for GATA-2 
expression in early hematopoietic progenitor cells may reside in the 
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deleted region. Expression of GFP in circulating blood cells increased 
from approximately 2% to 16%, suggesting that a potential repressor for 
expression of GATA-2 in erythrocytes may also reside in the deleted 
region. 

Embryos injected with P3-GM2, which contains GATA-2 
sequences from -3691 to +1, expressed GFP in circulating blood cells 
and in the EVL, but did not express in the CNS. Embryos injected with 
other constructs that lack the deleted 1116 bp region, extending from - 
4807 to -3692, also had no GFP expression in the CNS (Table 3). It was 
concluded that the 1116 bp region, extending from -4807 to -3692, 
contained a neuron-specific enhancer element. 

Embryos injected with P4-GM2, which contains GATA-2 
sequences from -2468 to +1, had a GFP expression pattern similar to 
those injected with P3-GM2. Injection with P5-GM2, which contains 
GATA-2 sequences from -1031 to +1, resulted in a sharp drop with 
respect to percentage of embryos expressing GFP in the EVL, but GFP 
expression in circulating blood cells was unaffected. This indicates that 
the 1437 bp region, extending from -2468 to -1032, contains an EVI^ 
specific enhancer. The 1031 bp segment present in P5-GM2 may 
represent the minimal expression sequences necessary for the maintenance 
of tissue specific expression of GATA-2. 

Neuron-specific enhancer activity 

To confirm the neuron-specific enhancer activity of the 1116 bp 
region that spans from -4807 to -3692 of GATA-2, nsP5-GM2 was 
constructed by ligating the 1116 bp fragment to P5-GM2, which contains 
the 1031 bp region upstream of the translation start of GATA-2 gene 
operably linked to a sequence encoding GM2 (Figure 4). Approximately 
70% of the embryos injected with nsP5-GM2 had GFP expression in the 
CNS (Figure 5), while no embryos injected with P5-GM2 had GFP 
) expression in the CNS as noted in Table 3. This indicates that the 1116 
bp region can effectively direct neuron-specific expression. 
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To determine whether the 1116 bp neuron-specific enhancer 
acuvity was context dependent, the construct ns-Xs-GM2 (Figure 4) was 
generated by ligating the enhancer to the Xenopus elongation factor la 
mmimal promoter (Johnson and Krieg. Gene 147:223-6 (1994)) operably 
5 linked to the sequence encoding GM2 (Xs-GM2; Figure 4) When 
injected with Xs.GM2, embryos expressed GFP in various tissues 
including muscle, notochord, blood cells and melanocytes. However no 
GFP expression was observed in the CNS (Figuie 5). Injection with ns- 
XS-GM2 resulted in 8.5% of the embryos having GFP expression in the 
10 CNS. far less than obtained by injection with nsP5-GM2 (Figure 5) 
Another construct. nsP6-GM2 (Figure 4). had an additional 653 bp 
deletion in the GATA.2 minimal expression sequence, extending from - 
1031 to -378. Injection of nsP6-GM2 resulted in 6.2% of embiyos 
expressing GFP in the CNS (Figure 5). Injection with P6-GM2 resulted 
15 m no GFP expression in the CNS (Table 3). These results suggests that 
the 1116 bp enhancer has some ability to confer neuronal specificity on a 
heterogeneous promoter, but requires proxhnal elements within its own 
promoter to exert its full activity. 

Fine mapping of a neuron-spedfic cis-acting regulatory 

20 element 

To precisely map the putative neuron-specffic enhancer, a series 
of constructs containing progressive deletions in the 1116 bp DNA 
fragment was generated by PGR. using nsP5-GM2 as the template The 
PGR products Obtained were used directly for microinjection. The first 

25 deletion series included ns4647. ns4493. ns4292. ns4092 and ns3990 
(where the nmnber indicates the upstream endpoim of the deleted 
fragment). Microinjection of all 5 mutants gave a simUar percentage of 
embryos having GFP expression in the CNS (Figure 6). This indicated 
that a neuron-specific enhancer resides within the 298 bp sequence (from - 

30 3990 to -3692) contained in ns3990. 

Next, two additional deletion constmcts. ns3872 and ns3789 were 
generated. As shown in Figure 6. over 60% of embryos injected with 
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ns3872 had GFP expression in the CNS, while embryos injected with 
ns3789 lacked GFP expression in the CNS. This indicated that the 
neuron-specific enhancer element was located within a 83 bp sequence 
from -3872 to -3790. 
5 Injection of embryos with three additional deletion constructs 

ns3851, ns3831 and ns380O allowed localization of the neuron-specific 
enhancer element to a 31 bp pyriraidine-rich sequence. This element has 
the sequence 

5*-TCTGCGCCGCTTTCTGCCCCCTCCTGCCCTC-3' (nucleotides 1 to 
10 31 of SEQ ID NO:20), which extends from -3831 to -3801 within the 
GATA-2 genomic DNA. 

Site directed mutagenesis within neuron-specific enhancer 
element 

To determine the core sequence necessary for the activity of the 
15 neuron-specific element, five primers, each having two to three altered 
nucleotides within the 31 bp neuron-specific element (see above), were 
used to amplify nsP5-GM2. The PGR products obtained were directly 
injected into single cell embryos. This 31 bp sequence contains an Ets- 
like recognition site (AGGAC) in an inverted orientation which is present 
20 in several neuron-specific promoters (Chang and Thompson, /. Biol Chem 
271:6467-75 (1996). Charron et al., /. Biol Chem 270:30604-10 (1995)). 
Therefore, four of the primers used in these PGR reactions contain altered 
nucleotides within the Ets-like recognition site or in the adjacent 
sequence. As expected, embryos injected with ns3831Ml, which contains 
25 two mutant nucleotides that are thirteen nucleotides upstream of the Ets- 
like recognition site, showed little change in neuron-specific GFP 
expression (Figure 7). A mutation of 2 nucleotides (ns3831M2) that lie 
three nucleotides upstream of the Ets-like recognition site had no effect on 
enhancer activity (Figure 7). Mutation of two nucleotides just one 
30 nucleotide upstream of the Ets-like motif, contained in ns3831M3, 

completely eluninated the neuron-specific enhancer activity of the 31 bp 
element (Figure 7). Mutation of three nucleotides (ns3831M4), of which 
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two Ife Within the Et..lilc» recogmtion site, also .^suited in . sha^ 
decrea^ in enhancer activity (Figure 7). A nmtation of two nucleotides 
that lie Within d,e Bs-liice recognition site (ns3831M5) reduced the 
nemon-speciflc enhancer «=tivity of the 31 bp element by approxinu,.ely 
50% (Figure 7). From this it was concluded that a CCCTCCT motif 
which partially overlaps the Ets-like recognition site wiUUn the 31 bp' 
sequence, is absolutely .equired for neuron-specific enhancer activi^ 
Thu, dissection of expression sequences using transgenic fish 
exempiifled in zebraflsh and with GATA.2 as described above, provite a 
system that allows the n„id ««l efficient identiflcation of flx^ cis-acting 
elements that play key roles in modulating the expression of 
developmentally regulated genes. Identiflcation of flrese cis-actmg 
elements is a useful step towatd determining the genes du.. operate earlier 
than the gene under smdy in the specification of a developmental paflrway 
(since the identified distal regula«„y elements mteract with transcription 
factors Which must be expmss«. for fl» regulatoty elements to iunction) 

Caretnl analysis of GATA-2 promoter activity in zebraflsh 
embryos revealed three distinct tissue specffic enhancer elements. TT,ese 
a«e elemems appear to act independently to enhance gene expression 
specifically in blood precursors, EVL, or ti« CNS. Deletion of one 
or two of the elements will generate transgene constracts tiiat can drive 
expn=ssion of a gene of interest in a specific tissue. Such constmcts also 
allow study of the tissue-specific iunction of genes expressed in multiple 
tissues. 



It has been shown U«t Uk developmental regulation of the 
™«»malian H0X6 and GAP-43 promoter activities is conserved in 
zebrafish (Westerfield e, al.. Genes De. «.591-8 (1992), Reinhatd e, al 
De^elo^en. 120:1767-75 (1994)). If the same neuron-speciflc e.emem ' 
.dentified in the zebrafish GATA-2 promoter is also shown to be required 
for neuron-specific activity of the mouse promoter, one could specificaUy 
knockout expression of GATA-2 m the mouse CNS by targeting tins cis- 
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element. This would allow one to detennine precisely the role that 
GATA-2 plays in the CNS. 

The neuron-specific enhancer element of GATA-2 has been 
precisely mapped and found to contain the core DNA consensus sequence 
for binding by Ets-related transcription factors. Although Ets-related 
factors have been implicated in the regulation of expression of a number 
of neuron-specific genes (Chang and Thompson, /. Biol Chem 271:6467- 
75 (1996), Charron et al., J. Biol Chem 270:30604-10 (1995)), another 
sequence, CCTCCT, present in this region of the zebrafish GATA-2 
promoter was found to be required for expression in the CNS. This motif 
partially overlaps an inverted form of the core sequence of the Ets DNA 
binding recognition site. As has been shown for other genes, the 
activities of Ets family proteins often rely more on tiieir ability to interact 
with other transcription factors than on specific binding to a cognate DNA 
sequence (Crepieux et al., Crit Rev Oncog 5:615-38 (1994)). It is 
possible tiiat an independent factor tiiat binds to the CCTCCT motif is 
required for neuron-specific activity of the GATA-2 promoter. 

A number of growth factors are known to affect early embryonic 
expression of GATA-2. Noggin and activin. which botii have dorsalizing 
activity in Xenopus embryos, downregulate GATA-2 expression in dorsal 
mesoderm (Wahnsley et al.. Development 120:2519-29 (1994)). BMP-4 
activates GATA-2 expression in ventral mesoderm and is probably 
important to early blood progenitor proliferation (Maeno et al.. Blood 
88:1965-72 (1996)). Growth factors that might affect expression of 
GATA-2 in neurons are not known. However, both BMP-2 and BMP-6 
can activate neuron-specific gene expression (Fann and Patterson, /. 
Neurochem 63:2074-9 (1994)). Consistent witii stadies on growth factors 
that upregulate or downregulate GATA-2 expression, GATA-2 promoter 
activity was excluded from the zebrafish dorsal shield. It has also been 
) discovered that lithium chloride treatment dorsalizes the injected embryos 
and dramatically reduces GATA-2 promoter activity as determined by 
GFP expression. 
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Al'h^ShGATA-aexpKssion has no. b«n Observed in U„EVL 
by m Sim hybridization on whole emtayos. thi. n«y be due to Uk 
condUion. used. In n»use. embryonic n»st ceH, p^„. in ^ ,,,, 
only been detecttd by ,„ sim hybridizaUon perfo™«d on skin Ussue 

Of GATA.2 m mouse skin mas, ceUs occurs o,^y during a shon period of 
emtayogenesis, similar to what has been found for EVL ceUs in 
»braflsh. It is possible that the constmcs used in this example may be 
mm.ng elements that would speeiflcally sUe«e GATA-2 expression in 
the zebraflsh EVL. 

The method described above is generally applicable ,„ the 

dissection of any developmemally regulated vertebrate p,„moter Tissue 
^peotf. and growd, fector response elements can be rapidly identified in 
flus manner. The fact that zebrafish typfcally pt^Juce hundreds of 
fertilized eggs per mating faciUtates obtaining statistically significant 
-suits. While tissue culture systems have been useful for identilymg 
™»y important transcription factors. ..^.sfection analysis m tissue cultute 
celb cannot sim,d«e ti« complex, nipidly changing mic,«„vi,„m„ent to 
wtach the promoter must respond during embtyogenesis. Temporal and 
^.ial analysis ofpromoter activity can be only poorly municked ,„ ^,ro 
Tl« system described herein allows complete analysis ofpromoter activity 
m all tissues of a whole vertebrate. 
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SEQUENCE LISTING 
^^^piSJt^ESicAL COLLEGE OF GEORGIA ^SEARCH FOUNDATION 
(ii) TITLE OF INVENTION: TRANSGENIC FISH WITH TISSUE-SPECIFIC 
EXPRESSION 
(iii) NUMBER OF SEQUENCES: 27 
(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Patrea L. Pabst 

(B) STREET: 2800 One Atlantic Center 

1201 West peachtree Street 

(C) CITY: Atlanta 

(D) STATE: GA 

(E) COUNTRY: USA 

(F) ZIP: 30309-3450 
(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC coinpatxble 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.25 
(vi) CXnyRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 
(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Pabst, Patrea L. 

(B) REGISTRATION NUMBER: 31,284 

(C) REFERENCE/DOCKET NUMBER: MCGIOO 
(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (404) -873-8794 

(B) TELEFAX: (404) -873-8795 

(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: no 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 
CCGGATCCTG CAAGTGTAGT ATTGAA 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: no 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 

AATGTATCAA TCATGGCAGA C 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: no 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 
TGTATAGTTC ATCCATGCCA TGTG 
(2) INFORMATION FOR SEQ ID NO-4- 

(x) SEQUENCE CHARACTERISTICS- 
A LENGTH: 21 base pairs 
B TYPE: nucleic acid 
C STRANDEDNESS : double 
(D) TOPOLOGY; line^ 

(IX) MOLECULE TYPE: DNA 
(xxi) HYPOTHETICAL- NO 

(iv) ANTI- SENSE: no 

(XX) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
ATGAACCTTT CTACTCAAGC T 

(2) INFORMATION FOR SEQ ID NO- 5- 

(x) SEQUENCE CHARACTERISTICS- 
B i^^r^'- 21 base pairs 
r n^^- nucleic acid 
C) STRANDEDNESS: double 
(D) TOPOLOGY- linear 
fix) MOLECULE TYPE: DNA 
(Xii) HYPOTHETICAL: NO 
(xv) ANTI -SENSE: no 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
GCTGCTTCCA CTTCCACTCA T 

(2) INFORMATim FOR SEQ ID NO:6: 

(X) SEQUENCE CHARACTERISTICS: 

n ^"^'^ " base pai^s 
B TYPE: nucleic acid 
C STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(ix) MOLECULE TYPE: DNA 
(xix) HYPOTHETICAL: NO 

(iv) ANTI -SENSE: no 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
AGACACAGTC CAGGTGAGTC CAA 

(2) INFORMATION FOR SEQ ID NO • 7 • 

(i) SEQUENCE CHARACTERISTICS - 

A) LENGTH: 23 base pairs 
B TYPE: nucleic acid 

C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 
(XI) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: no 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
CTTTCGCCAC CTGGTATGTT GTG 

(2) INFORMATION FOR SEQ ID NO • 8 • 

(x) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 22 base pairs 

B) TYPE: nucleic acid 

C) STRANDEDNESS: double 
(D) TOPOLOGY: linear 

(XI) MOLECULE TYPE: DNA 
(xxi) HYPOTHETICAL: NO 
(iy) ANTI -SENSE: no 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO- 8- 
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AAAAAGAGGC TGGTATGTAA AA 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: no 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
AAACTGCACA ATGTGAGTAT AC 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHTJiACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS ; double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) HYPOTHETICAL: NO 
<iv) ANTI- SENSE: no 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
ATTAAAACAG TTCGCCAAGT C 

(2) INFORMATION FOR SEQ ID N0:11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: no 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

AATTTTACAG AGGCTCGTGA A 

(2) INFORMATION FOR SEQ ID NO: 12: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: no 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 



CCTGCATCAG ATTGTCAGCA AA 

(2) INFORMATION FOR SEQ ID NO: 13: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: no 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
CTTTTTGCAG GTCAACAGGC CT 
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(2) INFORMATION FOR SEQ ID NO -14. 

(1) SEQUENCE CHARACTERISTICS- 

!«! ^T^'- ^ ^""^"O acids 
B) TYPE: amino acid 
(D) TOPOLOGY: linear 
li MOLECULE TYPE: protSS 
(xx) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 



Arg His Ser Pro Val Arg Qln Val 
(2) 



INFORMATION FOR SEQ ID NO- 15- 

(1) SEQUENCE CHARACTERISTICS- 

!n! ^ acids 

(B) TYPE: amino acid 

/^^i MOTSirJ2^°^°°^= linear 
MOLECULE TYPE: protein 
SEQUENCE DESCRIPTION: SEQ ID NO -15- 



Leu Ser Pro Pro Glu Ala Arg Glu 

(2) INFORMATION FOR SEQ ID NO-16- 

(1) SEQUENCE CHARACTERISTICS • 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

XI MOLECULE TYPE: protein 
(XX) SEQUENCE DESCRIPTION: SEQ ID N0:16: 
l.ys Lys Arg Leu lie Val Ser Lys 

(2) INFORMATION FOR SEQ ID NO-l?. 

<i) SEQUENCE CHARACTERISTICS- 

!5! ^^T^'- « ^-"ino acids 
(B) TYPE: amino acid 
(D) TOPOLOGY: linear 
xi MOLECULE TYPE: protein 
(XX) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
Lys Leu His Asn Val Aan Arg Pro 

(2) INFORMATION FOR SEQ ID NO-18- 

(1) SEQUENCE CHARACTERISTICS- 

!m ^ ^""ino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 
xi MOLECULE TYPE: proteiS 
(XI) SEQUENCE DESCRIPTION: SEQ ID NO:18: 
Trp Gly Ala Thr Ala Arg 

(2) INFORMATION FOR SEQ ID NO -19- 
(i) SEQUENCE CHARACTERISTICS - 
(A) LENGTH: 28 base pairs 
B TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
(xi MOLECULE TYPE: DNA 

(1x1) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: no 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
ATG6ATCCTC AAGTGTCCGC GCTTAGAA 
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(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: NO 

(iv) ANTI- SENSE: no 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
TCTGCGCCGC TTTCTGCCCC CTCCTGCCCT CTT 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQtmrCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: no 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
TCTGCGAAGC TTTCTGCCCC CTCCTGCCCT CTT 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) HYPOTl^ETICAL: NO 
(iv) ANTI -SENSE: no 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

TCTGCGCCGC TTTCTGAACC CTCCTGCCCT CTT 

(2) INFORMATION FOR SEQ ID NO: 23: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: no 

(xi) SEQtJENCE DESCRIPTION: SEQ ID NO: 23: 
TCTGCGCCGC TTTCTGCCAA CTCCTGCCCT CTT 



(2) INFORMATION FOR SEQ ID NO: 24: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: no 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
TCTGCGCCGC TTTCTGCCCC AAACTGCCCT CTT 
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(2) INFORMATION FOR SEQ ID NO: 25: 
(1) SEQUENCE CHARACTERISTICS. 

«! h^^'^'- " ^'^se pairs 
B TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
Ui MOLECULE TYPE: DNA 

(lii) HYPOTHETICAL: NO 
XV) ANTI- SENSE: no 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
TCTGCGCCGC TTTCTGCCCC CTCCTGCCCT CTT 
(2) INFORMATION FOR SEQ ID NO -26- 
(X) SEQUENCE CHARACTERISTICS- 

«i ^T^'- ^ase pairs 

B TYPE: nucleic acid 
C) STRANDEDNESS: double 

K,irlJ^^°^°^-- linear 
(ii) MOLECULE TYPE: DNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: no 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:26- 
OAATTCTAGT TCTAGGGTAA ACTATACAGT TTTTTTAATT 'aATAAAGT^O GI^GTAA 
ATCTC^TAA TGAGTAAGTC AC^TCAT ^ATTCATTT GATTTGTTCA AACAGTT..AT 
TCATTTAGAA ATTCATTAGA AATCAARCTG CAGTCTTTAT GAACCACCCG TTAAACCTTT 
AGTTTA^T^ ATTGOAATCA AAACCCCACT G^TGTTAAT CAGATGAATG CTGAAAAOCA 
CAGACAGGT. T^AATCCATC ATGCCATTCC TTCTAGAAAG GAAACATTAG TAATGGTTTT 
™CAGC ATTTTAATAA CCACAAGCAC AZ^CTAATO CAATGAAATC ATATTTOCAA 
ACCAAAACAG CT.ATTCT.. AAATGGCCTA CACAGAGTCC AGACCTGAAT ATTATAGAGA 
^CTGCAGTA TCACTTGAAA GAAAAATAAA CATTAATCTT AAATCTAAAG AACTTAAATC 
TAAAGAAGCA CTATGAGAAA TGCTGAAAAa GCCTGATTTT ACATAGCACA T^ATTTAAAA 
TGAAACCTCA GG.ACAGTAT ACAOAACAGT TCAAATACAG TATACAGTAA ACAGAACAGG 
TCAGGTCACA CCAAATACTG GCAAGCCATT ^ATTCTGAA AAI.TTTCAT TTAGATTAGA 
ACAGAAGAAC TANAGAGACC NNNAAAGTTG GCTGAATATA AATAAATATA CCACI.CTTT 
OACCGYTCTA GACTTT^GCA CAGTACTTAA A^CAGTACT TAAAGTAATT CNTCATI.AG 
ATGAGCTAAG TAAACTAT^ GT^T^ ^eACACCAT TGTGTGATGA GCAGI.AGGG 
TGTCACTGTA GCT.TGAATT TGTTCATGTA GTGCC:aTTAC TAGOTATACO ATCCCCAACC 
TCCCACTCCA A^AGATAG CTTCTTATCA CAGTTCAGCA GCAGCGCACA CACACAGAAA 
CACACACACA GCCACATCCN TCAAAANTGG TC^TGGAGA CTTCTTTCTC ..TGACCGTT 
TAGTTTTCGT GAGCATAATT AAGTTACTCT ATACAATAAA A^TGAGTAA ATGGACACCA 
TAGATGTCTA AATAAATAAA CACATAAATA AAAAGATGAC ACTITCACAT AACACCATCA 
AACAGCTTCA TAAAATTATA TTATATAGAA TATTCTATAA TTATGTTGAT TTGTAACGCA 
CTOTAAAAAA AGOATTACTG CCTTAAATTG ATAATTTGTT GAAGAAAATT TACTTTCC.^ 
AACATTTATT GTATTAATAT ATTACAGTAC GCTCAATAAT ACATGTGAAA CTGCAGCTTC 
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ATATTTTTAA ATGTTTTAAT GTATTTAATA TATATATATA TAATATTTAT ATATATATGT 
ATGCATGTAT GCATATTTAT TCTGTTGAAA GGAGATTAGT TTTATTCAAC ACATTAGTTT 
TAATAACTCG TTTCTAATAA CTGATTTCTT TTATCTTTGT CATGATGACA GTAAATAATA 
TTTGACTAGA TATTTTTCAA GACATTTCTA TACCACTTAA AGTGACATTT AAAGGCTTAA 
CTAGGTTAAT TAGGTTAAGT AAGCAGGTTA GGGTAATTGG GTAAGTTATT GTACAACAAT 
GGTTTGTTCT GTAGACTATT GAAAAAAATG GCTTAAAGGG GCTAATAATT TTQTcCCTTA 
AAATGGTGTT TAAAAATGTA AACTGCTTTT ATTGTGGCTG AAAAAACAAA TAAGAATTTC 
TCCAGAAAAA AAAATATTAT CAGACACTGT GAAAATGTCC TTACTCTGTT AAACATAATT 
TGTGAAATAT GTAAAAAAGA ATAAAAAATT CaCATGGGGG GTGATAACTT CAACTACACA 
CACACACACA CACACACACA CACATTTCAQ tOAcCAAAAT ATGTTGTRGG TTTNTKTNTT 
CATTGATATA AAaTGTGCGA TGcCATTTCM AAAATCCATA TATAGTTTAT GCAACATTAT 
ATTgGJ\MCCA AAATAAGTaA TATACAAAAT AAGTAGTATT ATCTTATCCA GTATATTTGA 
GTATTTATAT ATCGAAGTTT AGATTCYTAA TTTAACAATA TTTATGAATT ATATGTTTAA 
GTTCTAAAAC AACACCTCAT GTAAATCAAT AACATGGTGC TTGGTACAGT ATGCTCAATA 
ATACATGAAA AACTGCAGCT TCATATTTAA AAATGTTATT GTATGCAATT ACATGTACAA 
TTACAAATAA CGTATGGTAA TGTATACAAA TATATATTTA GTAATAGAGG GTATAATATA 
TGTGATGCAC ATGCGAAAAA ATATATCACA CACACACGCA CGCACGCACA CACACACACA 
CACACACATT TATTTATGCA TATGTACACT ATAAAACCCA AAAAGTTAAA CTCAAACCAT 
TTAAGGAAAC TGATTGCAAC AAACCATTAA AGTTGAAAAA CGAATCCTAA TGAQTACTGT 
AAACTGAATN TATTTGAGTA AACQAAGCAA TTTGAGGACA GTAAAACCCA ATAAATGAAG 
AGAACTCAAA CCAACTGAGC ACTGTAAAAC CTAACAAGTT AAGGCAACTC AAACCGTTTG 
AGGAAATCGA TATAAGAGTC CTGTGAACTG TATTTAATTA ACTCATTACT TCAAAACTCT 
TTTCAAATTA GTAGAATTAA CATTCAGTAC ATTTTGAGTT ACTACACTCA TTTCATTTGA 
TAAAGTTGAC TGTTGGGTTT TACAGTGTAT CTTTTTATTA ATTTATATAA GAACATGTGT 
GGATAATATA AGTACATTTA TTAACATCAT TATATATGTG GCTTCAGCTT TATGCAAATG 
CTGAAAGTTA ACGAATTGAA ATCAATTAAG CATTTCAGTA ACATAACACG TATTGTAGGT 
TTTGTCTTCA TTGATATACA CATGCAATGC ATTTCAAGTC ATTTATAATT GATGCATTAT 
ATTGTATTGT ACCAATGTAA GTAATATATA ATATACTATA TTATATTATC CAGTATATTT 
GACTTTAAAA TATTAAAGTT TAGATTCCTA ATGTAACAAT ACATATATAA TATGTTAAGG 
TTCTAGAATG GAACCTTATG TAAATCAAWA ACCTGGCGCT TGGTGAAGGA TTTGCTTCTC 
TGRATCTCAt CCCAGTTTCC CTGAAAATTA TAAATGCACA ATGGTGGARG GAAGTTGAAA 
GTGtTTTGCC TGTCAAATGA RARTGACAGT CTTAGTCCtG TGCTCCGgCA GSCCGTTCTG 
CGTCCGTATC TCTCACCATG ATTGCAGCAT TKGAGTTTAT TTGCATTACT GTTCTTTGCT 
GAGCTGCACC AgGGGAAAAG TGCTTTTGCA TTTTCATTCG CTTTGTTCAC AGTCACCGTT 
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TCCATCCCAA GTCCTC^T^ TTA.CACTTT OCACGCCAOT TTAATTCCCA A^TOTAT^AG 
GCCACAGCAT ATGCTTAATT COTTTCAACA ATGAAACITT ATTAATGATG TGCTTGAATC 
ATAGATACTA TAAGrTTATG G^TTGTTGTAA AATTARGrTT CTCTGGCTGT CTGTGGGArr 
TTCCCAGCGC TGTTGGAITT GCGTCTTTAT CTATATrTAT AAOIXSAAgCC AT^TATATA 
ATCTCTGACA GTATTTTATT TAGATTAGAA ATTAAATACT AGTGTrn^ GTCTTGTrrc 
TATAGTATTA TTACTATTTT ^TGCATTAA TTTACAGAAG ATGCCTCATA AACTGAATTT 
AGTATAATAA TTTAAATACC AAAACATCAT TAGGTACATT TAAAATACCA ATCATGCAAA 
AAAATAACCC TTTGACTOCA CATTTACCCA ATGGGTGTCC ATTTTTGACT TrTTAAATAA 
TGGTTTACAC ACACATCA,^ GCTGGTTTAC AAAAAAATCA AACATAATTC ™CACGA 
CTACTCTGAA TT^GGTrPC A^CATTTTC TT^GCTA AGTCTOT^;, ^AATATGGA 
GTCGCCACAG CGGAATGAAT CGCCAACTTA rTTAGCATAT GTTTCACACA G^TGCCC 
TTCCAGCTGC AAACCATCAC TGGGAAACAT CCATACACTA TGGgACAATT TAGCCTACCC 
AATTCATCTG AACTGCATGT CTTTGCAGGg AAACCCACAC AAACAC3OGG GAGAACA^T 
TTGC^AAT TGTAAAAAAA CAACCAGAAA GCATAATAAA TGAGAATCTC AAATATrTTT 
ACCGCATACT TCAAAAATAA AGATGATTTA GTAITAAAAA ATG^ATT ^TGAATATtG 
en^AAATA AAr«.GSCTT ACaCTTAGTA TATGTAtXAA .TCCAGTACT TTTACCATAA 
ACCGACATAT CMACCATTtG GTAGAGGTtG ATAtTTTAGA AATGACgA«A WGTGTTGAAA 
AAAAtGCATC gAGTGTGTAg CAACArTAGG AHTTAAgTAT TGCAAtGCAA AAaTtGTAaO 
•mAATCAATt AGGGACtAAT TAWTCGTCAA ■nTTAAATl^T TATAATTTGc TACTrTTTCT 
CAAACCACTA GGTTTCACTG ATTATTCAGC AAAATGrPAT TCATCArrTT CAATTOATA 
TATTTTAACA TGAGCAGCAT ™aCTTTA ATATATACTG CACAAAAAAT AGTTACATTO 
TGTT^TAAG COTTTCC^T AO^TATTTAT ^^TGAGC AGTATATTTT TAAAAAGTGA 
GAATAAATAT GTAGC^G TTTTACATAA CCATATGATG CACTTAACGA XGATGAAACA 
TTTCATTCAT ATTXGGGGCA TTTTATTTTT ACTTATTriTT ^GAAAAAA TGGACACTAA 
CTGTGGXTTT AAXAXGAXXX CXAXGXAAAX AAAAXGACO. XXGGACAXrX AAXXTGAXGX 
ACACXGXAAA AAAAAXCCAA CCXXAAAXXX XAACXXAAAX CAAGXXAACC XXAXCAGXAC 
AXXGAACXXA AA^TAXGXXA AACXGACAXA AAACXGAA^ AAXAACXXAX AAAAXXAAGX 
TAGAACACCA XAGAXXAAXG ^ACAAXGAA CXAAAAACXG XCAXGACXAA XXGXTCAXAT 

ITAXAXXrrX acagxgxaga xgxggaacax ccagxcxxxg ^taxaaggx caxaxaggct 

AAAAXTfXAAX AAAACAXXXA AAXAGGAAXX AAAAXTTXXG TXXCXTAAXA XXXXXAXXGX 
AAXXXCCXAA CAXXXACXCA GXGAAACXAA XXXCAGXXXT GAXTCTTXCA CXAXAAXAXG 
XGXAXAXAXG XGXAXXAXAA AAAXAAXXXG XGXXCAAAAX AAAAXAAAAA AAXTXGCACA 
AXCCTCCACX AXXCAXTXGA ACXGAACXCA CAXGCXGXGX CAGCXAGAGA XCX^CCAXAX 
AAXAXXCAAA AXGGAAAGCG XGGCCACCCG XATGGXAGGA GXGXCCAAAA AAAAGXACCC 
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CAACCCCACC CATTGGTGCC CTAC7VATTTC AAATGAACCT ACTAGTTCCC AAAGACTGAA 

GGAGATAAGC AAGCAAACAG GCGGCTAGTT CACTCCATGA TCTGAGaATC TCCTGRYACT 

GATAAACGAC ATCTTCAATA CTACACTTGC AGGATCCACT AGT 

(2) INFORMATION FOR SEQ ID NO: 27: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48H base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: no 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
ATATTTTGGG TTATGGCTAA AATAATTAAT GTCTAAAACG GGATTACGCG TTTTTCGTAA 
AGCTCAAAGA CGCATGTGCC AAAAATAGCC TTTTATTAAA TTGTTTGGTT ATTAAAATAT 
TATTCAACTT ATTTTACATC CATGGAAAGA GACATG6CCT CTTCTATTTG ACCTGCATGT 
GTTAAAACGA AATGCCAAAA TAAAGAAAAA AATGTAATTC AACATGTAAG GCTATTCAAA 
AACAATACAC AGGTACAAAA CATATCTTTG TTAATGAAAC TAATTTACAG TTTGTTTATT 
AAAACACACT ATAAATGCCA TAGAACATTT TGGAGATGCA TGCGTTATAC ATTGCGTGAT 
TTAACAGATC AATTAAAGTC GTATTTTGCG CCAGCATTTC AATGGGCATA ACGACTTAAT 
GTTTTCCTCT AGAATGATTA CAAATGTGAA AGCGAATGTG ATGTGATTGA GTTQAAGAAT 
TAGTTTTTTT TGGAATGCCC CAAGGACGCA TGCATTAGCC CACCTGTGCT GTTTATTTAA 
ATCATTGACT CCAAGAGCTG TCAGCCACAA AAGGAGGGCG GGCGCGCTGT CATCACCCAT 
CAGATTTATG ACTGCCACAC AATCATTTTC CGACTAAACT AACGCCATCA TCACTCAGAA 
CAAGAACTTC ATGAGTCGCA CAAGACAAGT TATAATAAAT GCATTACAGC GAATGCATGC 
ACAAACGCGA GAACCACTTT TGCTGCAAAA TAATGTGGAT TGTTGGTTGA AATGAAAACT 
GGGTGAGATG CTTTTCTTTC AATCCCTGTT ATCCATGCTT CAGCAGAGGA CAGGAGGCTT 
GTGACTTTGC CTGTGCCTGT GTCTGCCCCC GAGTGCCCTG TCACAATCTA ATTACCCGTG 
AGTAAAGGAC AATACCGCTT CAGCTGGTCT GTGTCATTCC CCCTATATCC CAGTGCCTGC 
TTATTTTCAC AAACCCTTCT GCGCCGCTTT CTGCCCCCTC CTGCCCTCTT TTAACCCCAC 
GGA6AATGAT AAATGCGCGG TGAGGGAACG AACGGGCAAA GCCATTTCAC GGCACCTGTT 
AATTAAGGGA ATGATTGCCT CCATTTTTCG CTGAGCTCGT TTCCAGCGTG CTCCATTATT 
TGTGATGCGA TTAATTGAAA GCGAATGTGA CATCACAACG AAGGTGATGT CATTGTCGCC 
GTCACACAGT AGAACGACAG AGTTACATAA GAAATAAAGT CTGCATGCAT ACATTTATGC 
ATGGCGTTTT AAAGAAGAGC GCACACTGGG TTAGAGTCCT CGGTGGGGTC AGCCACTTCG 
GTAACACCCC AAGCATTCAA TGCTAAGCCC TTAAAAGGAC AGCGTCTTTT GTTCTAACAT 
CGAGAGCACC GGGATTACCA CAGGTATTTA GTTCAGGTAT TCTCTAAGAA TATTTAGCCC 
TAGGTGAGCT GAACCAAGAG CAGTCATTAG CGCTAAAACT GGCTCTGATG GGAAGGGCTA 
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ACACAO^OVC ACACACACAC ACACACACAC ACACACACAT TATAATAAAT GTAAO^CAT 
GTTTACAACA ACTCCGOCAG TOATGCO^CA TATTGGCGGC GTACATACAC TAAATGmT 
AATGTAGTCT GTAAGACTAG AGAATCAGAA ATTAATTTAC ACAGAAATTA CAAAAATAAA 
TACATGTrPA AATAGTTAAT AAACATAATT CAAATATGTA ATGTATTATC GT.5TATTTTA 
ACATTAATGG ATGAGGTGGT TCAAATGCAT TTTGCACAAA ATAAAATCGA AGCAGCTTCA 
AATCGTAAAO ATAATAGTCG GTAGCATTGA ATCTGCTTTA ACATTTACTT TTAGCGAAGG 
CTACTTTATT AAGGAAGCTC ATArTAACTC CCAAT^TG TCTGCTATTG CACCTOXG 
AGGTGTAGAC T^IX^TAAAAT GCATCACTGC ACAGCAAAAT CAACCOTCAT ATTATCCTGT 
ACATTCTAAT ^GTTGGCTT CAGGCTGCCA GGGCTCT.TG TGCTGTGTAG GGCCCCTGGC 
CAGATTCCAG ^TGITAAAA AGGGATTTAC OCATCTGATA TTGTCACACA ATAAGGACAA 
ATAGCCCGrr TGAGCAXCrr TATACAACCA ACGCTGACAG AGGTTCOX^o GT^AAGTGC 

™gtgttgc atttgtgctt aaa^gat.^ TTTGGTGTTC aaccctcact ggaaaaaaat 

CTTTTGATGC AAATCGGTGC GrTTAGATAA AAAGAAGCAA AGCCTAOAAC TAAAGCCTAG 
AATTTATATT GCACTGTAGA TGTGGATGGT TATGGGAAAG TTTTTTGAGA TACTGTGGGG 
CGAGTCACOG CGTCAGAGTG GCGGCCGGTA GGGGCTCTAA ACTCGCGCTC CAATTArPGC 
CTGTCAGTCA TCATCGCT^ AGATTAGAGC ATGCGGATTA AAACTCATGC CTTTAAATAA 
TAACAACAGC GTCAATAl^A TCAAAAAGAC ACATCACGCT TATTTAAAAT CTACGAAATG 
TGTTAAAGCA TAA^TAC TACTGGT^ ^GTTGTAGA CCTGAAATCC TGTCAGATAG 
AAAT^AACTA CCCGGACCAC TGGTAGTTAA GTCTCTCTTG T^rTATC.^ GATTGATCCA 
ACCAGACAAG CTAGTTAAAT TAATAA^ TAAGCGCAAA GCG^GGTAC AAGCAGTTAG 
AGGGAGAAAG GTGAGAAGAA GCAATACAAA GTAGCTAAAT TCACAATGCA TTACATX^TC 
CATTTTAOAA ATGAAACACG AGGArTTAAT GTTAAATGAA TACAGAGTAG CTATAATCAG 
CAATACAAAG TAGCTAAA.^ CAGCAATACA AAGTAGCTAA ATTCAGCAAT ACAAAGTAGC 
TATATTCAGC AATACAAAGT AGCTAAAl^c AGCAATACAA AGTAGCTATA TTCAGCAATA 
CAAAGTAGCT ATATTCAGCA ATACAAAGTA GCTAAATTCA GCAATACAAC GTAGCTATAC 
™tAGCTA TACACTGTAT CCATTTTAGA AATGCACACG ATGATTTTCT GrrAAAAATC 
ACTGCTCATT TGAATTAGAT TATTTGAATT GGAGC^ACA TTGCAXGTAA TTAGTAAGCA 
AATTCGGCTT AACAAA™ AAACGCGTTT TTTTTTCTCG ACTAAATTAA TTAAGAAAAT 
GTATTATTGA TGGGTGCAAA CAGTAACAAT TTATTAAACC CTCTATGCAA ATGAGGTGTT 
CAGCTGACTA ACCTGCATCC ACAGTTTATC TAAACGCTTA TCAAACTAAT TGGCGACGTT 
CrarCTTTCT GCCTGCGGTG GGCGAGCCTG CTGCTTGTTT TGCCACGAGA TAATTGTACG 
CAAGAATCAA CGAAGCTGCC CTAATGGCCA CCAArTGGCT rTATTTGGAC CTGCCCATGC 
GACCTGTCGG CACCTCCAAG AGACGGGCTC GCTAITAATA TGTAAAGTOA CGTTTGATCG 
CTTGAAACGG CATACAAAGA CAGTGTTTTC ACAAGAAGAA .^GTGACA ACTCATTTAA 
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AACTATTAGA CGCGCAAGAA CAATAGCCCC CSUVrTTAGAQ ACCATAAAAT ACTCCTCCCC 3600 

AATTAATGCC TGAGGTGCTA GGAGTTGAGT TTGCTTGCAT TAGGCACATA TCTCATGTGA 3660 

CACTTCAGTG TTACAGGTTT TGTTGTTTTA AGCTAATGTT AATGGTCAGG GAACAGCTCG 3720 

TAATCACAAT ATATATTTAA AACAAATGAT TATTATQAAT GCAATAGGCC AAATCGATAT 3780 

TCATTAATAG AATAGAGGCA TTTTAATACA TTTCTGCACA ATTAAAAATT AAATATAATC 3840 

CTGCAAGTCT ATAATTATAT TATTCACATC ATTTAATGTC CTAAAAATAA ATTTAAAAAA 3900 

TAGCATTAGG CTGCAACTTA GATTTTAGGC TTTTCTGTTA GCACTTGAGT AAAAAGACAT 3960 

CATTACACAC CATCAACGTG AAGCTCTAAA AAGGGTAAAA AGATCTCAAT AAATTGCTGC 4020 

GCTGAATGAT QAGTCTCTCA GCTCTCTGGA TGTGGAQCSIG TAGGCCGACA GTCGCCGTGG 4080 

CATTTCGGAA AGCATGCTGT CCGAGCCAAT GGCAGTCAGC GCGCTCTGCT ATTGGTTCCC 4140 

AGGGCGCTCA CTGCCAGCTC GTGTCCCCGC CCSVrGTTCGT AAGATATGGA ATCTACTGGC 4200 

GCCAGTTCCG ACAGTACACA GGCACAATTC ATTAATGAQA CTTCTCTCCG CTTTAGACAG 4260 

ACGCAGAGTT TTAGGGAGAC TTTAACAATC GGGCTQTGGA CAATTTAAAC CAGTGGCQAA 4320 

TTACGAACGT CAACAGGCAT CTTGAGGATT AACATTCTTT GCGCAGGACT AACACGGGAA 4380 

AAATAAACGC AGGATTGGAG TGCTGAAATG CAACTTTGCG CCGTGAGTAC TTCCCGATAG 4440 

TTATTTGAAA TTGCGAGCAT TTAATTOAGC GATTTAATTG ATTGACTACA AAAGTTAGCC 4500 

TACTTATATT AACTGAGGCG TCGTCGTGTG AATTAAQATC TGTCTTQCAC TGTGTTTAAC 4560 
GTCAACACTQ AGATGCTTCT ATCTGTTATT CTCTTACAGG TGTCCCTGGC CACCCTTGAA 4620 
TGCAAAGAAG CAGGACCTCT ACACTCCTTC AAAAATAAAA GCATGCTCAQ AAAGTAAACA 4680 
GAGCATCGCC ACCTGAAGCA TTAAGCTAAC GACAGATATT TTAATAATCT AACGGACTAT 4740 
AGTGGTGCTT TCGGGTCTGT AGTGTCAAGT AAACTTTTCC AAGCATTTTC TAAGCGCGGA 4800 
CACTTGAGAT G ^^^^ 
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CLAIMS 

1. A transgenic fish the cells of which contam 



an exogenous construct, 



wherein the construct comprises homologous expression sequences operably 
linked to a sequence encoding an expression product, wherein the expression 
product is expressed only in specific cell Imeages. 

2. The transgenic fish of claim 1 wherein the expression sequences 
and the sequence encoding the expression product are not operably linked in 
nature. 

3. The transgenic fish of claun 1 wherein the expression product is 
heterologous. 

4. The transgenic fish of claim 3 wherem the expression product is a 
reporter protein. 

5 . The transgenic fish of claim 4 wherein the reporter protein is 
selected from the group consisting of i8-galactosidase. chloramphenicol 
acetyltransferase, and green fluorescent protein. 

6. The transgenic fish of claun 5 wherem the reporter protein is green 
fluorescent protein. 

7. The transgenic fish of claim 1 wherein the fish is selected from the 
group consisting of zebrafish. medaka, trout, salmon, carp, tilapia. goldfish, 
loach, and catfish. 

8. The transgenic fish of claim 7 wherein the fish is zebrafish. 

9. The transgenic fish of clahni wherein the expression product is 
expressed only in cells selected from the group consisting of blood cells, 
nerve cells, and skin cells. 

10. The transgenic fish of claim 9 wherein the expression product is 
expressed only in blood cells. 

11. The transgenic fish of claim 10 wherein the expression product is 
expressed only in erythroid progenitor cells. 

12. The transgenic fish of claim 9 wherein the expression product is 
expressed only in neurons. 
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13. The transgenic fish of claim 1 wherein the expression sequences 
are selected from the group consisting of GATA-1 expression sequences and 
GATA-2 expression sequences. 

14. The transgenic fish of claim 13 wherein the expression sequences 
comprise GATA-1 expression sequences. 

15. The transgenic fish of claim 13 wherein the expression sequences 
comprise GATA-2 expression sequences. 

16. The transgenic fish of claim 15 wherem the expression sequences 
comprise the GATA-2 promoter operably linked to the neuron-specific 
enhancer of GATA-2. 

17. The transgenic fish of claim 15 wherein the expression sequences 
comprise the GATA-2 promoter operably linked to the blood-specific enhancer 
of GATA-2. 

18. The transgenic fish of claim 15 wherein the expression sequences 
comprise the GATA-2 promoter operably linked to the skin-specific enhancer 
of GATA-2. 

19. The transgenic fish of claim 1 wherein the transgenic fish 
developed from, or is the progeny of a transgenic fish developed from, an 
embryonic cell into which the construct was introduced. 

20. The transgenic fish of claim 1 wherein the expression product is 
expressed only in predetermined cell lineages. 

21. The transgenic fish of claim 1 wherein the exogenous construct is 
genetically linked to an identified mutant gene. 

22. The tiransgenic fish of claim 1 wherein the expression sequences 
comprise a homologous promoter operably linked to a homologous enhancer. 

23. The transgenic fish of claim 22 wherein the expression sequences 
further comprise homologous 5' untranslated sequences operably linked to the 
promoter and the sequence encoding the expression product. 

24. The transgenic fish of claim 1 wherein the construct further 
comprises (a) intron sequences operably linked to the sequence encoding the 
expression product, (b) a polyadenylation signal operably linked to the 
sequence encoding the expression product, or both. 
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25. Cells isolated from the transgenic fish of claim 1 wherein the cells 
express the expression product. 

26. A method of making transgenic fish, the method comprising 

(a) introducing an exogenous construct into an embryonic cell of a first 
fish, wherein the construct comprises homologous expression sequences 
operably linked to a sequence encoding an expression product, and 

(b) allowing the egg ceil or embryonic cells to develop into a second 
fish, wherein the expression product is expressed only in specific cell lineages 
of the second fish. 

27. The method of claim 26 wherein the expression product is 
expressed only in predetermined cell lineages. 

28. The method of claim 26 wherein the method further comprises 
producmg progeny of the second fish. 

29. The method of claim 26 wherein the expression sequences and the 
sequence encoding the expression product are not operably linked in nature. 

30. The method of claim 26. wherein the expression sequences are 
expression sequences of a fish gene, wherein the method further comprises 

(c) exposing the second fish or progeny of the second fish to a test 
compound. 

(d) detecting the expression product in the fish exposed to the test 
compoimd, and 

(e) comparing the pattern of expression of the expression product in the 
fish exposed to the test compound with the pattern of expression of the 
expression product in the second fish or progeny of the second fish not 
exposed to the test compound, 

wherein if the pattern of expression of the expression product in the 
fish exposed to the test compound differs from the pattern of expression in the 
fish not exposed to the test compound, then the test compound affects 
expression of the fish gene. 

31. The method of claim 26, wherein the expression sequences are 
expression sequences of a fish gene, wherein the method further comprises 



62 



PCT/US98/11808 

WO 98/56902 



(c) detecting the expression product in the second fish or progeny of 
the second Hsh, 

wherein the pattern of expression of the expression product in the 
second fish or progeny of the second fish identifies the pattern of expression 
of the fish gene. 

32. The method of claim 26, wherein the expression sequences are 
expression sequences of a fish gene, wherein the method fiirther comprises 

(c) crossing the second fish or progeny of the second fish to a third fish 
haviiig an identified mutant gene to produce a fourth fish having both the 
exogenous construct and the identified mutation, 

(d) detecting the expression product in the fourth fish or progeny of the 
fourth fish, and 

(e) comparing the pattern of expression of the expression product in the 
fourth fish or the progeny of the fourth fish with the pattern of expression of 
the expression product in the second fish, 

wherein if the pattern of expression of the expression product in the 
fourth fish or progeny of the fourth fish differs firom the pattern of expression 
in the second fish, then the mutant gene affects expression of the fish gene. 

33. The method of claim 26, wherem the method further comprises 

(c) crossing the second fish or progeny of the second fish to a third fish 
having an identified mutant gene, wherein the exogenous construct and the 
mutant gene map to the same region of the genome, to produce a fourth fish 
having both the exogenous construct and the mutant gene, and 

(d) crossing the fourth fish to a fifth fish, wherein the fifth fish has 
neither the exogenous construct nor the mutant gene, to produce a sixth fish, 
wherein the sixth fish has both the exogenous construct and the mutant gene, 

wherein the mutant gene is marked by the exogenous construct in the 
sixth fish. 

34. The method of claim 33, wherein the method further comprises 
(e) crossing the sixth fish, or a progeny of the sixth fish, with a seventh 
fish, and 
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Wi'^'-aVtag progeny fish exp«sstogfl« expression product. wh«.m 
fish expressing the expression product have the mutant gene. 

35. The method of clam. 26. wherein the construct comprises a 
homologous promoter operably Unlced to a sequence encoding an expression 
product, wherein the promoter is not operably Itoked to a enhancer, wherein 
the method further comprises 

(c) detecting the expression product in the second fish or progeny of 
the second fish, 

wherein if the expression product is detected, then the exogenous 
construct is operably linked to a enhancer. 

36. The method of claim 35 further comprising 

(d) isolating the enhancer from the second fish or progeny of the 
second fish. 

37. The method of claim 35 further comprising 

(d) determining the pattern of expression of the expression product in 
the second fish or progeny of the second fish, 

wherem the pattern of expression of the expression pnKluct in the 
second fish or progeny of the second f«h identifies the pattern of expression 
of the enhancer. 

38. A method of identifying regulatory elements in sequences upstream 
of a gene of interest, the method comprising 

(a) introducing members of a set of exogenous constructs into separate 
embryonic cells, wherein each member of the set of constructs comprises a 
sequence encoding an expression product operably linked to upstream 
sequences of a homologous gene of interest, wherein the different members of 
the set have different regions of the upstream sequences deleted, 

(b) allowing the embryonic cells to develop into fish. 

(c) detecting the expression product in the fish or progeny of the fish, 

(d) determining which regions of the upstream sequences are needed for 
expression of the expression product. 

39. The method of claim 38 wherein determining which regions of the 
upstream sequences are needed for expression is accomplished by comparing 



64 



PCT/US98/11808 

WO 98/56902 



the expression of the expression product in fish into which different members 
of the set of exogenous constructs has been introduced, 

wherein if the expression product is detected in cells of interest in a 
fish, then the exogenous construct introduced into that fish mcludes a 
regulatory element for expression in the cells of interest, 

wherem if the expression product is not detected in cells of interest in a 
fish, then the exogenous construct introduced into that fish does not include a 
regulatory element for expression in the cells of interest. 

40. A nucleic acid construct comprising expression sequences derived 
from fish operably linked to a sequence encoding an expression product, 
wherein the expression sequences comprise a promoter operably linked to a 
enhancer, wherein the expression product is expressed only in specific cell 
lineages. 
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